Closed lczech closed 2 years ago
Could you please attach a FASTA file resulting in such errors, and tell me what OS you are using? I couldn't reproduce the issue.
Edit: I mean the first issue.
Regarding the second issue, this is tricky from a user experience perspective: file operations (write in this case) can only return success or error, and can only communicate this to the calling program, but not directly to the user. I made the conscious choice of just silently accepting sequence ‶deduplication″ instead of failing, but I can understand that it may be surprising.
I have no solution for now, but I'll try to find something.
Cheers,
Franklin
That's on Ubuntu 20.04.4 LTS.
This file
# good.fasta
>a
ACGT
>d
GATACA
gives
$ tree fusta/
fusta/
├── append
├── fasta
│ ├── a.fa
│ └── d.fa
├── get
├── infos.csv
├── infos.txt
├── labels.txt
└── seqs
├── a.seq
└── d.seq
4 directories, 7 files
while this file
# bad.fasta
>a/b\c?
ACGT
>d:e"f
GATACA
gives
$ tree fusta/
fusta/
├── append
├── fasta
├── get
├── infos.csv
├── infos.txt
├── labels.txt
└── seqs
4 directories, 3 files
Start with the good.fasta
from above. Then vi seqs/a.seq
to edit. The file starts as
ACGT
Edit this to be
ACGT
>c
CAT
then save, and unmount. This basically added a new sequence to the file, but one that only shows up when unmounting and mounting again. The same procedure can however also be used to edit nonsense into the file, which continues to work while mounted, but of course cannot be mounted again after the nonsense is written to file.
As said, that is kind of a user error, so it would be okay to ignore. But maybe this could also be checked, to improve user experience.
Lastly, when reading a broken file, I get
[INFO] Reading good.fasta...
thread 'main' panicked at 'Duplicated keys', src/fs.rs:582:13
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
which could also be improved to give a more understandable error.
Hi again,
it might be beneficial to improve robustness of fusta towards user errors. Arguably, this could just be left as user error, but nonetheless it might be nice to improve on that, for more user friendliness.
For example, FASTA files containing header names with characters that are not valid in a file system (
/
,\
,?
etc) will mount and giveinfos
andlabels
files with the correct content, but otherwise be silently empty.Similarly, when editing sequences to contain invalid FASTA content such as editing a seqs file to contain multiple new sequences, this is just silently written to the file when unmounting. That might even be misused as a "feature" to add sequences without going through the
append
directly - not sure how that messes with the internal file mapping while being mounted.It might be good to at least give warnings for such misuse ;-)
Cheers Lucas