SciLifeLab / facs

Fast and Accurate Classification of Sequences using Bloom filters
http://facs.scilifelab.se/
Other
16 stars 9 forks source link

Add plots for the original FACS 1.0 dataset #106

Open brainstorm opened 10 years ago

brainstorm commented 10 years ago

We should get some metrics (runtime/accuracy) from the old dataset (the one used in the original paper) with the current version of FACS.

Preferrably in an automated/reproducible way. Manually if it is too cumbersome.

@tzcoolman, @henrikstranneheim, can you take care of that?

tzcoolman commented 10 years ago

@brainstorm I thought I had done that before. Plus, fastq screen only takes datasets with fastq format while the old one is in fasta. FACS2.0 and deconseq has slightly runtime difference in handling fasta and fastq format file.

brainstorm commented 10 years ago

Are those plots relevant today given all the changes introduced to FACS since then?

Can they be regenerated/made by anyone else than you and/or Henrik? Where is the code to do so?

Thanks Enze!

tzcoolman commented 10 years ago

It is reproducible of course, though I didn't use a script to run it. All I have is relevant data. (Same as what Henrik had done in the paper, only counting ecoli and human chr 8 and 22) @brainstorm

tzcoolman commented 10 years ago

@arvestad @guillermo-carrasco @brainstorm I rebuild a synthetic dataset which contains exactly the same compound (species types) and similar proportion as the old one (Henrik's) using simNGS. It can be easily merged into the std python testing module (ecoli ref test). Should I just directly send it to you or I should do the test?

guillermo-carrasco commented 10 years ago

Hej!

If you generated it with SimNGS it should be easily reproducible right? We just need the generation parameters to get the same output, no need to send lots of GB through the network!

tzcoolman commented 10 years ago

@guillermo-carrasco I just dont understand. All I used is default setting. Do you mean all the ref genomes that Henrik used before

brainstorm commented 10 years ago

@tzcoolman You should write a python test that:

1) Builds the old dataset using the "default setting" you mention. 2) Runs FACS against it. 3) Generate data points, or even plots out of it.

guillermo-carrasco commented 10 years ago

what I meant is tha you don't have to send the dataset, but a "Howto" generate it. Automating the procedure as @brainstorm suggests would be the best solution, of course.

tzcoolman commented 10 years ago

@brainstorm It cannot be fully automatic. At least for 20 of the ref genomes, I dont know where to download them

tzcoolman commented 10 years ago

@brainstorm It cannot be fully automatic. At least for 20 of the ref genomes, I dont know where to download them

tzcoolman commented 10 years ago

@brainstorm It cannot be fully automatic. At least for 20 of the ref genomes, I dont know where to download them automatically

tzcoolman commented 10 years ago

@brainstorm It cannot be fully automatic. At least for 20 of the ref genomes, I dont know where to download them automatically

tzcoolman commented 10 years ago

@brainstorm It cannot be fully automatic. At least for 20 of the ref genomes, I dont know where to download them automatically

tzcoolman commented 10 years ago

@brainstorm It cannot be fully automatic. At least for 20 of the ref genomes, I dont know where to download them automatically

tzcoolman commented 10 years ago

@brainstorm It cannot be fully automatic. At least for 20 of the ref genomes, I dont know where to download them automatically

tzcoolman commented 10 years ago

@brainstorm It cannot be fully automatic. At least for 20 of the ref genomes, I dont know where to download them automatically

tzcoolman commented 10 years ago

@brainstorm It cannot be fully automatic. At least for 20 of the ref genomes, I dont know where to download them automatically

tzcoolman commented 10 years ago

@brainstorm It cannot be fully automatic. At least for 20 of the ref genomes, I dont know where to download them automatically

tzcoolman commented 10 years ago

@brainstorm It cannot be fully automatic. At least for 20 of the ref genomes, I dont know where to download them automatically

tzcoolman commented 10 years ago

@brainstorm It cannot be fully automatic. At least for 20 of the ref genomes, I dont know where to download them automatically

brainstorm commented 10 years ago

Alright, how big are those? Can you put them on your public dropbox account for now? I’ll figure out a better location for them.

tzcoolman commented 10 years ago

@brainstorm 64M without human chromosome ref. I guess for human HG19 chr 8 and chr21, it will be easy to automatically download it.

tzcoolman commented 10 years ago

@brainstorm.. I ll leave it on lars' desktop 'turing'. I haven't being able to use dropbox since June. It's about 100MB unzipped.