delehef / fusta

A FUSE filesystem to browse & edit FASTA files
Other
17 stars 2 forks source link

Robustness could be improved #5

Closed lczech closed 2 years ago

lczech commented 2 years ago

Hi again,

it might be beneficial to improve robustness of fusta towards user errors. Arguably, this could just be left as user error, but nonetheless it might be nice to improve on that, for more user friendliness.

For example, FASTA files containing header names with characters that are not valid in a file system (/, \, ? etc) will mount and give infos and labels files with the correct content, but otherwise be silently empty.

Similarly, when editing sequences to contain invalid FASTA content such as editing a seqs file to contain multiple new sequences, this is just silently written to the file when unmounting. That might even be misused as a "feature" to add sequences without going through the append directly - not sure how that messes with the internal file mapping while being mounted.

It might be good to at least give warnings for such misuse ;-)

Cheers Lucas

delehef commented 2 years ago

Could you please attach a FASTA file resulting in such errors, and tell me what OS you are using? I couldn't reproduce the issue.

Edit: I mean the first issue.

delehef commented 2 years ago

Regarding the second issue, this is tricky from a user experience perspective: file operations (write in this case) can only return success or error, and can only communicate this to the calling program, but not directly to the user. I made the conscious choice of just silently accepting sequence ‶deduplication″ instead of failing, but I can understand that it may be surprising.

I have no solution for now, but I'll try to find something.

Cheers,

Franklin

lczech commented 2 years ago

That's on Ubuntu 20.04.4 LTS.

Fasta header issue

This file

# good.fasta
>a
ACGT
>d
GATACA

gives

$ tree fusta/
fusta/
├── append
├── fasta
│   ├── a.fa
│   └── d.fa
├── get
├── infos.csv
├── infos.txt
├── labels.txt
└── seqs
    ├── a.seq
    └── d.seq

4 directories, 7 files

while this file

# bad.fasta
>a/b\c?
ACGT
>d:e"f
GATACA

gives

$ tree fusta/
fusta/
├── append
├── fasta
├── get
├── infos.csv
├── infos.txt
├── labels.txt
└── seqs

4 directories, 3 files

Editing and adding weird stuff

Start with the good.fasta from above. Then vi seqs/a.seq to edit. The file starts as

ACGT

Edit this to be

ACGT
>c
CAT

then save, and unmount. This basically added a new sequence to the file, but one that only shows up when unmounting and mounting again. The same procedure can however also be used to edit nonsense into the file, which continues to work while mounted, but of course cannot be mounted again after the nonsense is written to file.

As said, that is kind of a user error, so it would be okay to ignore. But maybe this could also be checked, to improve user experience.

Lastly, when reading a broken file, I get

 [INFO] Reading good.fasta...
thread 'main' panicked at 'Duplicated keys', src/fs.rs:582:13
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

which could also be improved to give a more understandable error.