Is assembly from subsampled reads representative?

blahah / assemblotron-paper

Paper for Assemblotron

MIT License

0 stars 0 forks source link

Is assembly from subsampled reads representative? #1

Open blahah opened 9 years ago

blahah commented 9 years ago

Experiment:

[ ] take a set of long-ish paired-end reads from a published rnaseq experiment in a model organism
[ ] reservoir sampling method: make subsets at 80, 50, 20% of the reads (in triplicate?)
[ ] graph sampling method: make subsets at 80, 50, 20% of the partitions (in triplicate?)
[ ] run assemblotron on each subsample varying only K and using only soapdt
[ ] plot the score distribution over k for each subsample

cboursnell commented 9 years ago

Yeah I agree do at least triplicate for the sampling. 5 or more times would be even better

blahah commented 9 years ago

OK, let's go with 5 to start with. We need to balance robustness with computation time. Perhaps we don't need to take large samples, like 80%. We could focus on smaller ones - perhaps starting with 5, 10, 20 %? My thinking is that we don't really care whether larger samples are representative because they don't help much. We just want to show whether the sort of sizes we might actually use are useful.

blahah commented 9 years ago

OK, I've done sweeps of both the yeast and arabidopsis datasets (see #7). Results to follow