PROBIC / mSWEEP

mSWEEP High-resolution sweep metagenomics using fast probabilistic inference
MIT License
13 stars 2 forks source link

Kallisto reference index for multi-contig assemblies #2

Closed aweimann closed 5 years ago

aweimann commented 6 years ago

Hello again,

After successful installation I want to build the reference index with kallisto. I suppose the right command is kallisto index -i example_kmi (The command in the pre-processing says kallisto pseudo whereas the toy example says kallisto index.) Now, my reference sequence assemblies each consist of multiple contigs, but the readme asks to have one FASTA with all the reference sequences and to provide a clustering file with as many cluster assignments. This seems to be a bit impractical and please correct me if I'm wrong, but I assume I would have to merge all my assemblies into a very big FASTA file and provide a cluster assignment with as many rows as contigs in all assemblies combined?

Thanks, Aaron

tmaklin commented 6 years ago

Hi, Great to hear the installation worked. The pre-processing should indeed say index instead of pseudo, thanks!

You should merge the contigs in a sequence assembly into a single sequence, place the single sequences in a big FASTA file, and then provide the cluster assignments for those sequences. I hope this makes sense, the readme should probably be expanded a bit on this part.