Open rcedgar opened 4 years ago
Variant calling suggested by @ababaian from slack:
java -Xmx12G -jar /home/ubuntu/software/GenomeAnalysisTK.jar \
-R hgr1.gatk.fa -T HaplotypeCaller \
-ploidy 2 --max_alternate_alleles 6 \
-I $LIBRARY.bam -o $LIBRARY.hgr1.vcf
Assigning @taltman for coverage plot & variant analysis vs. closest genome if close enough (say, >=97% identity per the minimap2 alignment).
So Rayan runs Minimap, and can provide the sequence of the closest genome as an argument to Darth, right? Then I can do the rest.
On July 17, 2020 10:59:07 AM PDT, Robert Edgar notifications@github.com wrote:
Assigning @taltman for coverage plot & variant analysis vs. closest genome if close enough (say, >=97% identity per the minimap2 alignment).
-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/ababaian/serratus/issues/183#issuecomment-660257408
-- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Sounds right to me. We need minimap2 alignments to three separate references (cannot be combined into one!) anyway for the master table: (a) refseqs, (b) complete genomes including refseqs, and (3) fragments. The SAM file for (a) gives the closest reference genome, though it's not quite trivial, you have to check soft-clipping (Snn values in CIGAR) & %id (given by NM:i tag). Soft clipping should be <5% of the genome, %id should be >= 97%.
I think Rayan and I need to delineate who is doing what.
On July 17, 2020 3:46:51 PM PDT, Robert Edgar notifications@github.com wrote:
Sounds right to me. We need minimap2 alignments to three separate references (cannot be combined into one!) anyway for the master table: (a) refseqs, (b) complete genomes including refseqs, and (3) fragments. The SAM file for (a) gives the closest reference genome, though it's not quite trivial, you have to check soft-clipping (Snn values in CIGAR) & %id (given by NM:i tag). Soft clipping should be <5% of the genome, %id should be >= 97%.
-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/ababaian/serratus/issues/183#issuecomment-660368829
-- Sent from my Android device with K-9 Mail. Please excuse my brevity.
hey, the minimaps were already run on all assemblies! results are on the master table: https://serratus-rayan.s3.amazonaws.com/sra_master_table.csv made using that script: https://gitlab.pasteur.fr/rchikhi_pasteur/serratus-batch-assembly/-/blob/master/master_table/minimap2_contigs.sh but I used @rcedgar's minimap QC script as-is. No additional tuning such as the one in https://github.com/ababaian/serratus/issues/183#issuecomment-660368829
does it need to be run on the new complete genomes clustered by idy 99 though (https://github.com/ababaian/serratus/issues/204)? that https://github.com/ababaian/serratus/issues/183#issuecomment-660461416 run is on cov5