Closed y1zhou closed 5 months ago
On your questions:
open_pdb_raw_with_options
as a new functions and then rewiring the original open_pdb_raw
as a call to that function. If you think this would be too inconvenient for users we might think about breaking the API.I like the idea of keeping the API as simple as possible. I think that if we can have fewer functions for reading a file, that would be better, as I agree we don't want to duplicate code and also want it to be clear for end users to know which to use.
I see that there are two kind of ways to do this:
open_pdb_with_opens(..., ReadOptions)
OpenOptions
in the std
library: https://doc.rust-lang.org/std/fs/struct.OpenOptions.htmlI was initially wanting to change it to be like option 2, but from a time perspective, I just needed to add the functionality as fast as I could, so I went with 1.
What do you think>
I think that the open options would be the more clear way of doing this and allowing further expansion in the future. So if everyone here agree I think we can make this a breaking change and move to make this the default. After this work has been done it might be worthwhile to see if we can trivially add back the original public functions with the same api and mark them as deprecated to be removed in the next breaking change (v0.13
) so that users have at least one major version to move their stuff over.
Yeah I'm already down the rabbit hole of open_*_with_options()
so let's try that first. I also like the idea of eventually deprecating the functions without ReadOptions
to avoid confusion.
Sounds good. I theory, we are able to have both the approaches of 1 and 2.
I like using the builder pattern, as it provides a good extensible framework where we can add more options without changing the API, this applies to both approaches, but I think that 2 is more idiomatic
Alright most implementations are done with the following exceptions:
ReadOptions.format
: do we keep this for route 2? Currently the input file type is determined at many different places and it might be nice to have a single source of truth.ReadOptions.capitalise_chains
: seems like it's being used in chain_iter
of open_pdb_raw_with_options
as a fallback? Not sure how the option should interact with this during parsing. Also for mmCIF files there doesn't seem to be a counterpart for this.It looks like you made a lot of new public methods for opening files.
Now is about time to decide which pub functions to keep and which ones to remove (or deprecate). Here's my two cents:
Only keep open(filename)
and open_with_options(filename, options)
.
open
would be open_with_options
with sane default options. Now that level
lives in ReadOptions
I would also suggest removing the level
argument in open()
.open_gz
can be called by setting options.decompress
in open_with_options
.open_raw
does some extra magic in filetype guessing which is not very reliable, e.g. an atom-only PDB file doesn't start with HEADER
and cannot be identified (somewhat related #122).
open_pdb_raw_with_options
and open_mmcif_raw_with_options
directly.ReadOptions.format
and let this determine which internal function to route to. open_pdb(filename)
, open_pdb_with_options(filename, options)
, open_pdb_raw_with_options(BufReader, context, options)
.open_mmcif(filename)
, open_mmcif_with_options(filename, options)
, and open_mmcif_raw_with_options(BufReader, context, options)
.On the open_*_raw
functions, currently open_pdb_raw
takes BufReader
but open_mmcif_raw
takes &str
. I think it makes more sense to just expose a function that takes BufReader
as the input can be streamed, and converting a &str
in memory to a BufReader
is trivial.
To summarize,
Function | Action |
---|---|
general::open |
Remove level argument |
general::open_gz |
Deprecate |
general::open_raw |
Deprecate |
pdb::open_pdb |
Remove level argument |
pdb::open_pdb_raw |
Deprecate |
mmcif::open_mmcif |
Remove level argument |
mmcif::open_mmcif_bufread |
Deprecate |
mmcif::open_mmcif_bufread_with_options |
Deprecate |
mmcif::open_mmcif_raw |
Deprecate |
mmcif::open_mmcif_raw_with_options |
Take BufReader instead of &str , and parse line-by-line instead |
I like the sound of those changes.
I agree that we still want to have a public function which takes a BufReader
as an input, since people may want to do some more complex tasks than getting the library to handling the opening of a file, e.g., maybe receiving over a socket or something...
These changes sound very good indeed. But how does this look on the ReadOptions
side? There is a function read
which takes an &str
right now. I would argue that adding a ReadOptions::read_raw
is enough to gain feature parity between the old system and the new system. If that function is added I would say we just expose the ReadOptions
functions. But to ease the development for all users I think exposing the exact same api as the old system makes sense, but all of these functions should be marked deprecated
.
Sounds good. I'll add the functions to ReadOptions
as well and see how it looks.
Things might be a little messy but the changes we discussed last time should be updated now. Please let me know if I missed anything!
To help you out this is the output from cargo-public-api
run as follows: cargo public-api diff 0.11.0
on your feature branch:
Removed items from the public API
=================================
-pub fn pdbtbx::open_raw<T: std::io::Read + std::io::Seek>(input: std::io::buffered::bufreader::BufReader<T>, level: pdbtbx::StrictnessLevel) -> core::result::Result<(pdbtbx::PDB, alloc::vec::Vec<pdbtbx::PDBError>), alloc::vec::Vec<pdbtbx::PDBError>>
Changed items in the public API
===============================
-pub fn pdbtbx::open(filename: impl core::convert::AsRef<str>, level: pdbtbx::StrictnessLevel) -> core::result::Result<(pdbtbx::PDB, alloc::vec::Vec<pdbtbx::PDBError>), alloc::vec::Vec<pdbtbx::PDBError>>
+pub fn pdbtbx::open(filename: impl core::convert::AsRef<str>) -> core::result::Result<(pdbtbx::PDB, alloc::vec::Vec<pdbtbx::PDBError>), alloc::vec::Vec<pdbtbx::PDBError>>
-pub fn pdbtbx::open_mmcif(filename: impl core::convert::AsRef<str>, level: pdbtbx::StrictnessLevel) -> core::result::Result<(pdbtbx::PDB, alloc::vec::Vec<pdbtbx::PDBError>), alloc::vec::Vec<pdbtbx::PDBError>>
+pub fn pdbtbx::open_mmcif(filename: impl core::convert::AsRef<str>) -> core::result::Result<(pdbtbx::PDB, alloc::vec::Vec<pdbtbx::PDBError>), alloc::vec::Vec<pdbtbx::PDBError>>
-pub fn pdbtbx::open_pdb(filename: impl core::convert::AsRef<str>, level: pdbtbx::StrictnessLevel) -> core::result::Result<(pdbtbx::PDB, alloc::vec::Vec<pdbtbx::PDBError>), alloc::vec::Vec<pdbtbx::PDBError>>
+pub fn pdbtbx::open_pdb(filename: impl core::convert::AsRef<str>) -> core::result::Result<(pdbtbx::PDB, alloc::vec::Vec<pdbtbx::PDBError>), alloc::vec::Vec<pdbtbx::PDBError>>
Added items to the public API
=============================
(ignored)
For full API compatibility there should be no functions listed in the removed and changed items section. So in this case that means keeping the open_raw
and keeping the strictnesslevel
parameter in the other open functions. As discussed all open functions we do not intend to support should be marked deprecated
.
It looks like the original open_raw
definition is not easily reproduced with the ReadOptions
as it determined the file type with the first characters being HEADER
or data_
. So if it is not possible to keep this function we could drop this one already in this release.
To help you out this is the output from
cargo-public-api
run as follows:cargo public-api diff 0.11.0
on your feature branch:
This is pretty handy! Thanks for the pointers.
For full API compatibility there should be no functions listed in the removed and changed items section. So in this case that means keeping the
open_raw
and keeping thestrictnesslevel
parameter in the other open functions. As discussed all open functions we do not intend to support should be markeddeprecated
.It looks like the original
open_raw
definition is not easily reproduced with theReadOptions
as it determined the file type with the first characters beingHEADER
ordata_
. So if it is not possible to keep this function we could drop this one already in this release.
I agree that open_raw
should be dropped and marked as a breaking change. Using the first few characters to determine the file type becomes less reliable especially with the atomic_only
modifiers being introduced.
For the other functions, adding back level
to open_pdb
and open_mmcif
makes sense as they are deprecated anyway. For open
, what would be the best approach to modify its signatures? In our previous discussions I think we agreed to let open
use all default options, and ReadOptions
to handle all other cases. Do we still want the old open(filename, level)
or just open(filename)
?
For open
lets look at the standard library. There there are very simple 'magical' functions alongside the open options, which is exactly as discussed before. So we should go for that (using open(filename)
) and we should try and make it abundantly clear how to convert existing code in the release notes.
Sounds good! We have one removed item open_raw
and one changed item open
. Hopefully it's not too much burden to the end users to migrate existing code to ReadOptions
. Looking forward to v0.12.0!
Just want to follow up on this PR and see if there's anything to be updated before this can be merged.
I propose a merge (I was awaiting a response from @OWissett).
thanks for merging. Apologies for delay, I was on holiday
This is to address #125, where we pass in
ReadOptions
to theopen()
functions instead of justStrictnessLevel
for future extensibility.ReadOptions
ReadOptions
open_gz
whenoptions.decompress
is trueoptions.discard_hydrogens
is trueoptions.only_first_model
is trueATOM
records whenoptions.only_atomic_coords
is trueSome things up for discussion:
capitalise_chains
? If I recall correctly all uppercase letters, lowercase letters, and numbers are valid chain identifiers.open_pdb_raw
andopen_mmcif_raw
? They are public functions but it seems they are only meant to be called byopen_pdb
andopen_mmcif
. If we maintainopen_pdb_raw_with_options
andopen_mcif_raw_with_options
there's going to be a lot of duplicated code (or we extract common methods into another core function).