ababaian / serratus

Ultra-deep search for novel viruses
http://serratus.io
GNU General Public License v3.0
254 stars 33 forks source link

Downstream analysis of assemblies #183

Open rcedgar opened 4 years ago

rcedgar commented 4 years ago
rcedgar commented 4 years ago

Variant calling suggested by @ababaian from slack:

java -Xmx12G -jar /home/ubuntu/software/GenomeAnalysisTK.jar \
  -R hgr1.gatk.fa -T HaplotypeCaller \
  -ploidy 2 --max_alternate_alleles 6 \
  -I $LIBRARY.bam -o $LIBRARY.hgr1.vcf
rcedgar commented 4 years ago

Assigning @taltman for coverage plot & variant analysis vs. closest genome if close enough (say, >=97% identity per the minimap2 alignment).

taltman commented 4 years ago

So Rayan runs Minimap, and can provide the sequence of the closest genome as an argument to Darth, right? Then I can do the rest.

On July 17, 2020 10:59:07 AM PDT, Robert Edgar notifications@github.com wrote:

Assigning @taltman for coverage plot & variant analysis vs. closest genome if close enough (say, >=97% identity per the minimap2 alignment).

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/ababaian/serratus/issues/183#issuecomment-660257408

-- Sent from my Android device with K-9 Mail. Please excuse my brevity.

rcedgar commented 4 years ago

Sounds right to me. We need minimap2 alignments to three separate references (cannot be combined into one!) anyway for the master table: (a) refseqs, (b) complete genomes including refseqs, and (3) fragments. The SAM file for (a) gives the closest reference genome, though it's not quite trivial, you have to check soft-clipping (Snn values in CIGAR) & %id (given by NM:i tag). Soft clipping should be <5% of the genome, %id should be >= 97%.

taltman commented 4 years ago

I think Rayan and I need to delineate who is doing what.

On July 17, 2020 3:46:51 PM PDT, Robert Edgar notifications@github.com wrote:

Sounds right to me. We need minimap2 alignments to three separate references (cannot be combined into one!) anyway for the master table: (a) refseqs, (b) complete genomes including refseqs, and (3) fragments. The SAM file for (a) gives the closest reference genome, though it's not quite trivial, you have to check soft-clipping (Snn values in CIGAR) & %id (given by NM:i tag). Soft clipping should be <5% of the genome, %id should be >= 97%.

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/ababaian/serratus/issues/183#issuecomment-660368829

-- Sent from my Android device with K-9 Mail. Please excuse my brevity.

rchikhi commented 4 years ago

hey, the minimaps were already run on all assemblies! results are on the master table: https://serratus-rayan.s3.amazonaws.com/sra_master_table.csv made using that script: https://gitlab.pasteur.fr/rchikhi_pasteur/serratus-batch-assembly/-/blob/master/master_table/minimap2_contigs.sh but I used @rcedgar's minimap QC script as-is. No additional tuning such as the one in https://github.com/ababaian/serratus/issues/183#issuecomment-660368829

rchikhi commented 4 years ago

does it need to be run on the new complete genomes clustered by idy 99 though (https://github.com/ababaian/serratus/issues/204)? that https://github.com/ababaian/serratus/issues/183#issuecomment-660461416 run is on cov5