Closed chris-kreitzer closed 3 years ago
Make a BED file: chr, pos (0-based), end pos (1-based), id bcftools query -f'%CHROM\t%POS0\t%END\t%ID\n' file.bcf
A) : updated the mpileup function (including +plugin-in; "INFO/AF") and modified for-loop; check scripts;
B): look into bedtools again; filter variants with VAF > 0.2 and create BED file for bedtools (bcftools query); comment above!
05/04:
A) Ask why it doesn't work chromosome-wise (revise the script)
B) Variants are filtered; twist is selected (INDEL)
C) look whether I can make bedtools running; optionally with query (selecting only the columns needed)
A) Updated bcftools mpileup pipeline (filtering AF > 0.2)
B) bedtools is running - look into input parameters
C) filter twist Variants
D) speak with Juan (chromosome-wise) - discussing about results (SNPeff?)
05/04:
A) updated bcftools (variant-calling) script chromosome-wise:
ln -s /proj/ferrer/rna_raw/*bam
module load samtools; samtools index <.bam>
A) Using bedtools to call intersection between VCF and tcs_exon_gtf file; --> discuss with Juan about the output!
05/07:
A) Off-targets: re-wrote script for mpileup; only including variants with AF > 0.2 & max-missing 0.9 (10% missing genotyp) & only keep INDELs; this file was used for the intersection BEDTOOLS with GTF (exon only)
B) look into the intersected file (variants) and discuss with Juan about the output
C) automated Cellranger: for-loop compatible; however, be sure that all FASTQ files are within same folder (parent folder) otherwise will not work;
Revise the current pipeline:
A) using RUN to loop through samples (parallelisation on the cluster)
export RUN=0
# initialise the Variable (BASH starts with 0 rather than 1) determine the number of elements within list; e.g.:a=(3 4 5 6)
# array with 3 elements (can be .bam files, etc.)echo ${#list[@]}
# prints the number of elements within a; in this case 3for loop -sbatch
for RUN in {0..15}
# important: BASH starts with 0; and .. (two) dotsdo
echo $RUN
done
arrays
a=(3 4 6 3 2)
echo "${a[*]}"
# print every entry in a = arrayecho "${a[0]}"
# print the first element of a = array~~~~~~~~~~B) bcftools: instead of running over samples, we can also run over chromosomes; in the case of N. vectensis we have 15 chromosomes. RUN=0; chr=15 (basically as for-loop above);
module load bcftools
bcftools mpileup -r $chr ${list[@]} | bcftools call <some output arguments>
# check the -r argument for chromosomes and the output commands~~~~~~~~~~C) AF tag: filter for Variants < 0.2 (most likely sequencing artefacts; also check depth of sequencing)
D) Finding intersections
module load bedtools
bedtools -intersect -a vcf -b tcs_CDS.gtf
# if gtf is not supported, convert it to gff;ggfread
# helpful in converting gtf --> gff; extracting information of transcripts, etc.E)
snpeff/4.3
Using this tool to predict the effect of variants; either nonsense, frameshift, nonstop, etc.)~~~~~~~~~~Overall goal:
There will certainly be many variants found, in both mutants and wild-type. Those variants mainly arise as the reference (where the reads were mapped to) and the current sequenced samples deviate. However, we are more interested, if there is a specific difference between MUTANT and WILD-type animals.
--> are variants found in one MUTANT ALSO found in the other replicates; or just artefacts? --> WILD-type should not have any INDEL at twist