SciLifeLab / facs

Fast and Accurate Classification of Sequences using Bloom filters
http://facs.scilifelab.se/
Other
16 stars 9 forks source link

facs remove -o #21

Closed henrikstranneheim closed 10 years ago

henrikstranneheim commented 11 years ago

When using the -o option: -o /proj/b2012037/private/henrik/Example/Example1

yields: /proj/b2012037/private/henrik/Example/Example1Example1_Homo_sapiens.GRCh37.70_nochr_clean.fastq

making it look more like "-o" is an out directory command. When specifying '-o' then the suffix "Bloom_filter.clean" or "Bloom_filter.contaminants" should be added to the specified file name (if such exists). --> Example1_Homo_sapiens_GRCh37.clean & Example1_Homo_sapiens_GRCh37.contaminants. If "-o" path ends in a "/" indicating a directory then the complete entry as in the first paragraph should be used.

brainstorm commented 11 years ago

@henrikstranneheim @tzcoolman, @arvestad, I think the canonical way for any UNIX command regarding outputs is to exactly output the name that the user has specified:

-o whatever

Should generate "whatever" as an ouput file, nothing more, nothing less. Otherwise it's very confusing, IMHO.

henrikstranneheim commented 11 years ago

Yeah agreed. The trouble is that we are creating 2 files from 1. And if we are using a list we are creating 2 files for every Bloom filter specified. Here is a suggestion:

-od/--outdirectory --> Files are names according to: datasetName_BloomfilterName.clean.infileformat & datasetName_BloomfilterName.contaminants.infileformat

-fp/--fileprefix FileName (as specified from user)

-fs append .clean and .contaminants respectively to the "-fp" option. -fs takes to option where 1st is the cleaned file and 2nd is the contaminated reads, which should be stated clearly in the help text.

you cannot use -od and -fp together. You must use -fp and -fs together.

Thoughts?

brainstorm commented 11 years ago

In general I think that this functionality should be outside of facs, scripted away as an example in the documentation (for loop that goes through the generated bloom filters), i.e:

        for ref in os.listdir(self.reference):
            qry = test_fname.fastq
            bf = os.path.splitext(ref)[0]+".bloom"

            facs.remove(qry, bf)

Or way simpler in bash.

But yes, @henrikstranneheim, I agree with your specification if it gets implemented.

tzcoolman commented 10 years ago

@brainstorm it has actually been fixed since several months ago. I ll close it in the next PR

brainstorm commented 10 years ago

Actually, that's true, we agreed upon using stdout for clean sequences and stderr for contaminated ones... Please make sure this fact is documented in the facs remove help-like commands and that this functionality works as advertised (all flag combinations/edge cases (empty file, wrong format, etc...)) work.

Now that you mention your PR... can you please review and merge mine if correct? ;)