I tried to add commit messages when things might not be obvious, but feel free to ask questions about anything. Also, not all of the commits need to be accepted — if some are useful we can keep those and toss the rest.
Here are some other non-direct-code thoughts I had...
I expect people looking at the Rust code are going to look for some
hot-button issues.
Usage of unsafe. The Rust community generally dislikes unsafe
being used for "frivolous" purposes. Every unsafe block should
be annotated with a SAFETY comment that explains why that
specific usage is safe. This is helpful to show what aspects you
have thought for now and in the future when you revisit the
code. Any unsafe fn should have a # Safety section in its
documentation explaining what the caller needs to do to uphold the
safety invariants.
Documentation for public API functions. You don't have much here /
the public API is really from Python. It may be worth running
cargo doc --open and checking out the docs as they are.
Relatedly, how your API is organized. There's a single public
function and a handful of types, so there's not really many
concerns for organization at the moment.
Normal project health signals — user-facing documentation,
examples, tests, CI/CD, responsiveness to issues & pull requests,
number of issues & pull requests.
mmap is contentious. As best as I've ever figured out, it's not
actually safe to the letter of the law in Rust. That's because it
presents an API where data is immutable but can actually be mutated
from outside of the Rust process. It would probably be good
to have some hard numbers for "when we used mmap, the time went
from X to Y" to show that this isn't a frivolous usage of unsafe.
Numeric casts with as are usually pretty suspicious. These can
silently transform your data from something meaningful to something
not. When possible, usage of Into or TryInto is recommended.
The code appears to cast freely between u64 and usize, which
suggests that it really only works on 64-bit machines. Ideally, that
would be enforced at compile time to prevent anyone from
accidentally running this on 32-bit machines. Otherwise, the code
should guard against cases where a u64 cannot fit in a usize.
The duplication of to_vec in the match arms of
dicom_values_to_vec is ultimately required because each call is
being monomorphized for the specific type passed in. Even though the
source code is the same, the generated code differs for each
concrete type.
7FE00010 appears to be a magic value; ideally this should be a
constant with a useful name. There are a handful of similar values
floating around as well (OB, PN, etc.).
From a non-AI, non-ML only-Rust perspective, I would be curious why
there's not a separate Rust-only library, as opposed to the current
one so tightly bound to Python. We love our pure Rust cores with
wrapper layers for other languages. These aren't free as usually
there's a bit more type shuffling needed in those adapter layers.
Using files as a data transfer mechanism between the Python and the
Rust is a bit surprising. Perhaps this is mostly done for the
ability to use mmap or development velocity, but I had expected to
see sending giant blobs of bytes around.
I tried to add commit messages when things might not be obvious, but feel free to ask questions about anything. Also, not all of the commits need to be accepted — if some are useful we can keep those and toss the rest.
Here are some other non-direct-code thoughts I had...
I expect people looking at the Rust code are going to look for some hot-button issues.
Usage of
unsafe
. The Rust community generally dislikesunsafe
being used for "frivolous" purposes. Everyunsafe
block should be annotated with aSAFETY
comment that explains why that specific usage is safe. This is helpful to show what aspects you have thought for now and in the future when you revisit the code. Anyunsafe fn
should have a# Safety
section in its documentation explaining what the caller needs to do to uphold the safety invariants.Documentation for public API functions. You don't have much here / the public API is really from Python. It may be worth running
cargo doc --open
and checking out the docs as they are.Relatedly, how your API is organized. There's a single public function and a handful of types, so there's not really many concerns for organization at the moment.
Normal project health signals — user-facing documentation, examples, tests, CI/CD, responsiveness to issues & pull requests, number of issues & pull requests.
mmap
is contentious. As best as I've ever figured out, it's not actually safe to the letter of the law in Rust. That's because it presents an API where data is immutable but can actually be mutated from outside of the Rust process. It would probably be good to have some hard numbers for "when we usedmmap
, the time went from X to Y" to show that this isn't a frivolous usage of unsafe.Numeric casts with
as
are usually pretty suspicious. These can silently transform your data from something meaningful to something not. When possible, usage ofInto
orTryInto
is recommended.The code appears to cast freely between
u64
andusize
, which suggests that it really only works on 64-bit machines. Ideally, that would be enforced at compile time to prevent anyone from accidentally running this on 32-bit machines. Otherwise, the code should guard against cases where au64
cannot fit in ausize
.The duplication of
to_vec
in the match arms ofdicom_values_to_vec
is ultimately required because each call is being monomorphized for the specific type passed in. Even though the source code is the same, the generated code differs for each concrete type.7FE00010
appears to be a magic value; ideally this should be a constant with a useful name. There are a handful of similar values floating around as well (OB
,PN
, etc.).From a non-AI, non-ML only-Rust perspective, I would be curious why there's not a separate Rust-only library, as opposed to the current one so tightly bound to Python. We love our pure Rust cores with wrapper layers for other languages. These aren't free as usually there's a bit more type shuffling needed in those adapter layers.
Using files as a data transfer mechanism between the Python and the Rust is a bit surprising. Perhaps this is mostly done for the ability to use mmap or development velocity, but I had expected to see sending giant blobs of bytes around.