Closed jakobnissen closed 2 years ago
Merging #68 (b828fc6) into master (f03bccf) will increase coverage by
5.89%
. The diff coverage is91.24%
.:exclamation: Current head b828fc6 differs from pull request most recent head c66d035. Consider uploading reports for the commit c66d035 to get more accurate results
@@ Coverage Diff @@
## master #68 +/- ##
==========================================
+ Coverage 84.39% 90.28% +5.89%
==========================================
Files 12 11 -1
Lines 660 628 -32
==========================================
+ Hits 557 567 +10
+ Misses 103 61 -42
Flag | Coverage Δ | |
---|---|---|
unittests | 90.28% <91.24%> (+5.89%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
Impacted Files | Coverage Δ | |
---|---|---|
src/fasta/record.jl | 82.14% <82.69%> (+0.89%) |
:arrow_up: |
src/fasta/index.jl | 85.31% <85.10%> (-14.69%) |
:arrow_down: |
src/FASTX.jl | 91.17% <91.04%> (-8.83%) |
:arrow_down: |
src/fastq/reader.jl | 75.86% <91.66%> (-13.50%) |
:arrow_down: |
src/fasta/reader.jl | 87.50% <92.20%> (-2.36%) |
:arrow_down: |
src/fastq/record.jl | 94.94% <94.68%> (+10.89%) |
:arrow_up: |
src/fastq/quality.jl | 95.00% <95.00%> (+9.81%) |
:arrow_up: |
src/fasta/readrecord.jl | 100.00% <100.00%> (+3.57%) |
:arrow_up: |
src/fasta/writer.jl | 100.00% <100.00%> (+3.70%) |
:arrow_up: |
src/fastq/readrecord.jl | 100.00% <100.00%> (+53.84%) |
:arrow_up: |
... and 8 more |
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
After a little thinking, I might
Union{Nothing, UnitRange{Int}}
to be explicit about when it's empty.I think empty identifiers are technically valid, eg
>
ATTGC
>
CCGAC
But I don't know if I would think of this as id == ""
or ismissing(id)
.
A FASTA record without a sequence doesn't make any sense to me.
But I don't know if I would think of this as
id == ""
orismissing(id)
.
Either way, the end-user would still need to check and decide how to handle the information. I don't think reinterpreting the StringView
does any favours.
So, reading some webpages e.g.
I think we have an answer to this question regarding using missing or empty string returned by description(rec)
.
These pages seem to describe the entire '>' line as a [description|definition] line i.e. identifier + any additional info.
So, what if, we take that stance. We include identifier
, as a convenience, and then, we have description()
defined as giving you the entire description line - including the identifier.
So, for any record with just an ID and no additional info, will give you identifier(rec) == description(rec)
.
Any record with additional content after the identifier, then identifier(rec) != description(rec)
.
Then, because so many platforms have their own way of dealing with extra info - e.g. ncbi has the whole "[tag=value]" thing. We simply take the position "use description() to get the whole description line, parse it how you will, ya on ya own buddy."
Thus identifier becomes a subset of the description, and the behaviour of the two, is consistent.
@jakobnissen How do you feel about this proposal of doing away with description
and just providing identifier
and definition
(essentially renamed header
)?
That's a good idea. I like it. I'll implement the changes this week
@SabrinaJaye @kescobo and other interested parties:
This is now ready for review/test. There is too much code to review, but you can play around with it and see if you like how it feels, and if you approve of the changes described in the OP here. I recommend reading the new, updated documentation.
Now what is needed is just nice-to-haves, which can always be added later.
The only thing left to do here before tagging v2 is just to code coverage (I will take care of that), and if @SabrinaJaye have any ideas for high-level operations.
During the next week, I will finish up the last remaining tests, then in 1-2 weeks, I will squash merge this to master unless you have any comments, and then release FASTX v2.
Why does Documenter think you want to deploy via Travis.ci?
I think if you add push_preview=true
to deploydocs()
here, it should build a preview so we can view it online. See here.
@kescobo I tried to add previews, but apparently it's failing? :/ I can't figure out why. I added a new documenter key, but the build job claims it's not there or it's empty. Maybe it's acceptable that it doesn't work for PRs, I can look at it after pushing this to master.
I can look at it after pushing this to master.
Seems fine, I can try too. I'll build docs locally for now
Why a breaking change?
Essentially, #63 is unsolvable without making a breaking change.
I figured, if we were to break the API anyway, there were several areas where FASTX could be made nicer.
Important changes
External
BioGenerics
method have been removed, except the ones used for the readers/writers.@
Record
from a string. Instead, useparse(Record, str)
.quality_scores
returns the qualities as a lazy, validating iterator of scores using a default QualityEncoding object to decode ASCII PHRED scores to quality scorescopy
, which defaults totrue
. Iffalse
, the reader will overwrite the same record on iteration. This makes the oldwhile !eof(reader)
idiom obsolete in favor of iterating overReader(io; copy=false)
.transcribe
has been removed, as it is now trivial to do the same thing.faidx
function.extract
can now extract parts of sequences from indexed FASTA files without loading a whole record. E.g. if you have a whole chromosome, you can load just a few basepairs without loading the entire chromosome (see #29)validate_fasta
andvalidate_fastq
to quickly and memory-efficiently check if a file is FASTX-formatted.Internal
closes #77 closes #73 closes #37 closes #63 closes #29