BioJulia / FASTX.jl

Parse and process FASTA and FASTQ formatted files of biological sequences.
https://biojulia.dev
MIT License
61 stars 20 forks source link

Rewrite for v2 #68

Closed jakobnissen closed 2 years ago

jakobnissen commented 2 years ago

Why a breaking change?

Essentially, #63 is unsolvable without making a breaking change.

I figured, if we were to break the API anyway, there were several areas where FASTX could be made nicer.

Important changes

External

Internal

closes #77 closes #73 closes #37 closes #63 closes #29

codecov[bot] commented 2 years ago

Codecov Report

Merging #68 (b828fc6) into master (f03bccf) will increase coverage by 5.89%. The diff coverage is 91.24%.

:exclamation: Current head b828fc6 differs from pull request most recent head c66d035. Consider uploading reports for the commit c66d035 to get more accurate results

@@            Coverage Diff             @@
##           master      #68      +/-   ##
==========================================
+ Coverage   84.39%   90.28%   +5.89%     
==========================================
  Files          12       11       -1     
  Lines         660      628      -32     
==========================================
+ Hits          557      567      +10     
+ Misses        103       61      -42     
Flag Coverage Δ
unittests 90.28% <91.24%> (+5.89%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/fasta/record.jl 82.14% <82.69%> (+0.89%) :arrow_up:
src/fasta/index.jl 85.31% <85.10%> (-14.69%) :arrow_down:
src/FASTX.jl 91.17% <91.04%> (-8.83%) :arrow_down:
src/fastq/reader.jl 75.86% <91.66%> (-13.50%) :arrow_down:
src/fasta/reader.jl 87.50% <92.20%> (-2.36%) :arrow_down:
src/fastq/record.jl 94.94% <94.68%> (+10.89%) :arrow_up:
src/fastq/quality.jl 95.00% <95.00%> (+9.81%) :arrow_up:
src/fasta/readrecord.jl 100.00% <100.00%> (+3.57%) :arrow_up:
src/fasta/writer.jl 100.00% <100.00%> (+3.70%) :arrow_up:
src/fastq/readrecord.jl 100.00% <100.00%> (+53.84%) :arrow_up:
... and 8 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

jakobnissen commented 2 years ago

After a little thinking, I might

kescobo commented 2 years ago

I think empty identifiers are technically valid, eg

>
ATTGC
>
CCGAC

But I don't know if I would think of this as id == "" or ismissing(id).

A FASTA record without a sequence doesn't make any sense to me.

CiaranOMara commented 2 years ago

But I don't know if I would think of this as id == "" or ismissing(id).

Either way, the end-user would still need to check and decide how to handle the information. I don't think reinterpreting the StringView does any favours.

TransGirlCodes commented 2 years ago

So, reading some webpages e.g.

here here here

I think we have an answer to this question regarding using missing or empty string returned by description(rec).

These pages seem to describe the entire '>' line as a [description|definition] line i.e. identifier + any additional info.

So, what if, we take that stance. We include identifier, as a convenience, and then, we have description() defined as giving you the entire description line - including the identifier.

So, for any record with just an ID and no additional info, will give you identifier(rec) == description(rec).

Any record with additional content after the identifier, then identifier(rec) != description(rec).

Then, because so many platforms have their own way of dealing with extra info - e.g. ncbi has the whole "[tag=value]" thing. We simply take the position "use description() to get the whole description line, parse it how you will, ya on ya own buddy."

Thus identifier becomes a subset of the description, and the behaviour of the two, is consistent.

TransGirlCodes commented 2 years ago

@jakobnissen How do you feel about this proposal of doing away with description and just providing identifier and definition (essentially renamed header)?

jakobnissen commented 2 years ago

That's a good idea. I like it. I'll implement the changes this week

jakobnissen commented 2 years ago

@SabrinaJaye @kescobo and other interested parties:

This is now ready for review/test. There is too much code to review, but you can play around with it and see if you like how it feels, and if you approve of the changes described in the OP here. I recommend reading the new, updated documentation.

Now what is needed is just nice-to-haves, which can always be added later.

The only thing left to do here before tagging v2 is just to code coverage (I will take care of that), and if @SabrinaJaye have any ideas for high-level operations.

During the next week, I will finish up the last remaining tests, then in 1-2 weeks, I will squash merge this to master unless you have any comments, and then release FASTX v2.

kescobo commented 2 years ago

Why does Documenter think you want to deploy via Travis.ci? image

kescobo commented 2 years ago

I think if you add push_preview=true to deploydocs() here, it should build a preview so we can view it online. See here.

jakobnissen commented 2 years ago

@kescobo I tried to add previews, but apparently it's failing? :/ I can't figure out why. I added a new documenter key, but the build job claims it's not there or it's empty. Maybe it's acceptable that it doesn't work for PRs, I can look at it after pushing this to master.

kescobo commented 2 years ago

I can look at it after pushing this to master.

Seems fine, I can try too. I'll build docs locally for now