BioJulia / BioStructures.jl

A Julia package to read, write and manipulate macromolecular structures
Other
95 stars 22 forks source link

Speed up parsing of PDB line #50

Closed timholy closed 5 months ago

timholy commented 5 months ago

By checking the length of the line before parsing it, we can avoid the overhead of try/catch for the optional fields.

PDBTools writes data into the optional fields like "charge", so

julia> using PDBTools

julia> PDBTools.writePDB(PDBTools.wget("6HN6"), "test.pdb")

gives a file that on master yields

julia> @btime read("test.pdb", BioStructures.PDB);
  49.759 ms (44601 allocations: 2.09 MiB)

and on this branch yields

julia> @btime read("test.pdb", BioStructures.PDB);
  2.581 ms (40373 allocations: 1.96 MiB)

This is ~WIP~RFC because some of the tests reach into internals to check the error message for specific fields when supplied a too-short string. Thus a decision is needed about whether we're OK with the modified approach here; if so, perhaps those tests can just be deleted.

jgreener64 commented 5 months ago

Looks like a great change, I wasn't even aware of tryparse. The redundant tests can be deleted.

timholy commented 5 months ago

Ready to go if if tests pass and you agree.

codecov[bot] commented 5 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 94.90%. Comparing base (a70f6f4) to head (501735e). Report is 1 commits behind head on master.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #50 +/- ## ========================================== - Coverage 94.92% 94.90% -0.03% ========================================== Files 9 9 Lines 1794 1786 -8 ========================================== - Hits 1703 1695 -8 Misses 91 91 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

jgreener64 commented 5 months ago

Great, thanks for this.