Open rchikhi opened 3 years ago
Added 3 co-assemblies, with various k-mer sizes suggested by Anton. The largest one, assembly length wise, is k29,43
but you could also have a look at k33,55,77
which is smaller but maybe has different content.
Added motifator
results for the two 'best' co-assemblies (those with highest k value <= 77) using cmdline (1).
Results are in
s3://serratus-rayan/rVert-assembly/motifator-results/rnaviralspades_coassembly_k29,43/
and
s3://serratus-rayan/rVert-assembly/motifator-results/rnaviralspades_coassembly_k33,55,77/
Added motifator
results for individual SRA accessions (kept only those with non-empty LHF) using cmdline (1).
Results are in: s3://serratus-rayan/rVert-assembly/motifator-results/individual/
cmdline (1):
transeq -frame 6 $input $input.aa
base=$(basename $input)
./motifator -search_rdrp $input.aa -model rdrp_model.txt \
-tsvout results/$base.tsv \
-report results/$base.txt \
-fevout results/$base.fev \
-medhionly \
-trim_fastaout results/$base.trim.LHF.fa \
-motifs_fastaout results/$base.motifs.fa
To be 100% explicit: no HMMs here were involved :)
Update: re-uploaded s3://serratus-rayan/rVert-assembly/motifator-results/individual/
which, up until this comment, contained the wrong files (I had mistakingly run motifator on the .pro reads and not contigs).
Now motifator has been run on the individual SRA's for unitigs (before_rr.fasta
), contigs and scaffolds.
@rcedgar asked "how many contigs give motifator hits"? To attempt to answer this, I ran:
$ grep "high-conf" *.tsv |cut -d"." -f1 |sort|uniq|wc -l
3388
$ grep "medium-conf" *.tsv |cut -d"." -f1 |sort|uniq|wc -l
112
How many SRA accessions were in rVert:
$ ls ../../fasta/ |cut -d"." -f1 |sort|uniq|wc -l
70070
(turns out many .pro files are empty, at least 20k) UPDATE: Due to a bug I didn't create fasta files for .pro files containing a single read, will re-run but it shouldnt change results much
How many SRAs were assembled into empty unitigs:
$ find ../individual/ -name "*.before_rr.fasta" -empty|wc -l
51888
How many non-empty contigs:
$ find ../individual/ -name "*.contigs.fasta" |wc -l
18182
Thus 3388/18182=18.6% non-empty contigs have a high-confidence RdRp hit. but if you count all SRAs including empty contigs, that number drops to 4.8%.
Out of the 18k rVert non-empty individual assemblies, 501 of them have different filesizes between before_rr.fasta
and contigs.fasta
(cc @asl). the difference is typically not big (~100 bp).
An extreme example: SRR3999033 (12.2kbp vs 8.8kbp).
Other smaller ones: DRR032780, SRR3289253, SRR5085421. Assemblies are in s3://serratus-rayan/rVert-assembly/individual/
Results are here:
s3://serratus-rayan/rVert-assembly/
Folder structure:
Data:
Results:
Scripts: