biocpp / biocpp-io

BioC++ Input/Output library
https://biocpp.github.io
BSD 3-Clause "New" or "Revised" License
8 stars 5 forks source link

[feature] seq_io::reader and lots of required code #11

Closed h-2 closed 2 years ago

h-2 commented 2 years ago

Ok, here is some substance now!

Regarding the design of the reader-constructors and the options, my thoughts are the following:

  1. Options should be optional. But providing the format is required when constructing from a stream.
  2. If we move this to the options, it would need an extra "auto-detect" state for the filename constructor.
  3. This state makes it more difficult to define the variant yourself (see below).
  4. The options should be the same for all constructors; but the "auto-detect" state wouldn't work for the stream constructor.

I really think this would make the whole thing less readable.

I don't know if you were aware, but in contrast to the old seqan3-implementation, format choice is not a template parameter / compile-time decision. You can specify it at runtime and this is useful:

std::variant<bio::fasta, bio::fastq> format;
if (some_user_option == true)
    format = bio::fasta{};
else
    format = bio::fastq{};

bio::seq_io::reader reader{std::cin, format};

The easiest other way would look like this:

decltype(bio::seq_io::reader_options{}.selected_format) format;
if (some_user_option == true)
    format = bio::fasta{};
else
    format = bio::fastq{};

bio::seq_io::reader reader{std::cin, bio::seq_io::reader_options{.selected_format = format} };

And we still have the problem that it is possible for the developer to "forget" this, and we can only detect that at run-time. In the current model, the constructor simply requires the format argument, so a possible error is detected at compile time.

h-2 commented 2 years ago

You might want to have a look at reader_base, seq_io::reader and seq_io::reader_options.

I will add more documentation including proper snippets to the reader before merging. The tests might already be helpful for getting a feel for the interfaces.

smehringer commented 2 years ago

Before I check the code.

In the first code version: Say, fasta, fastq and genbank are valid formats for seq_io. Does the variant then have to be std::variant<fasta, fastq, genbank> if I just want fasta and fastq, no right?

If you re-write the second example to:

bio::seq_io::reader_options my_options{};
auto & format = my_options.selected_format;

if (some_user_option == true)
    format = bio::fasta{};
else
    format = bio::fastq{};

bio::seq_io::reader reader{std::cin, my_options};

It doesn't look bad I think. You don't even have to know that you need a std::variant.

h-2 commented 2 years ago

In the first code version: Say, fasta, fastq and genbank are valid formats for seq_io. Does the variant then have to be std::variant<fasta, fastq, genbank> if I just want fasta and fastq, no right?

You are correct, it needs to include all formats. The reader exports ::format_type currently that could also be used in either case, but it is not beautiful.

It doesn't look bad I think. You don't even have to know that you need a std::variant.

Yes, it looks better. It has the problem, though, that the format is usually set in a different function (argument parsing) than where the reader is initialised, so the user would have to pass-as-auto to the function, too.

It also doesn't solve the other issues I mentioned:

More generally, I think that if the validity of the "selected-format" depends on the filename/stream, than this clearly indicates that it really belongs together with the filename/stream. It is not an option of the format or the reader, it is part of "the input". We also don't put the filename/stream in the options, right?

Anyways, let's not get stuck on this for too long. If I haven't convinced you, yet, let's start a list with design questions that we return to later and/or ask other people for their opinion.

smehringer commented 2 years ago

Yes lets make a list. Maybe in the wiki?

h-2 commented 2 years ago

This now tracks seqan3/master.

@smehringer : I am still not sure whether I should add -Wno-unused-variable to our snippet.cmake or to the CI build... how does SeqAn do this?

smehringer commented 2 years ago

This now tracks seqan3/master.

@smehringer : I am still not sure whether I should add -Wno-unused-variable to our snippet.cmake or to the CI build... how does SeqAn do this?

I think SeqAn has removed the warning because snippets serve demonstrative purposes and are not "correct applications". I don't think that testing for unused variables adds a lot of value to the snippet tests.

h-2 commented 2 years ago

I think SeqAn has removed the warning because snippets serve demonstrative purposes and are not "correct applications". I don't think that testing for unused variables adds a lot of value to the snippet tests.

Yes, but where does it remove the warning and why isn't it automatically removed for our builds if we use all the SeqAn infrastructure?

smehringer commented 2 years ago

I think SeqAn has removed the warning because snippets serve demonstrative purposes and are not "correct applications". I don't think that testing for unused variables adds a lot of value to the snippet tests.

Yes, but where does it remove the warning and why isn't it automatically removed for our builds if we use all the SeqAn infrastructure?

Oh ok. I'll take a look