Open Duhyadi opened 5 years ago
Hi! I would like to help you, but I don't know exactly what data I should use! Please indicate data location in the issue.
hi, its look like you need the reference genome of your study specie I suggest you first download the reference genome in fasta format and put it in your repository
HI! Here is all the info about the plugin +setGT. I hope it will be helpful
once you have al your required files, you could try execute this script like this:
./vcf2fq.pl -f <input.fasta> <all-site.vcf> > <output.fastq>
I think first you must instal perl
Hi! I found another code to do what you need. Check it
bcftools consensus all-site.vcf.gz < input.fasta > output.fasta
To download bcftools you download here.
Then decompress it.
and:
cd bcftools-1.9
./configure --prefix=/where/to/install
make install
This is the link to download the zea mays: https://plants.ensembl.org/info/website/ftp/index.html
Hi!! Here are info about indexing FASTA reference genome with SAMtools.
Hi, to use a reference genome it has to be indexed.
Hi! Here are the things that we have to do BEFORE trying to convert to fasta:
[x] Download reference genome corn.
[x] Download GFF corresponding to the reference genome
[x] Index the reference genome w/ bwa or samtools
[x] Compress the vcf file using tabix function from bcftools example here. bgzip file.vcf
[x] Index the vcf compressed file using the tabix function. tabix file.vcf.gz
[x] Run command
bcftools consensus -f ref_geneA.fa calls.vcf.gz > consensus.fa
About your coment respect index with bwa:
the following BWA command:
bwa index -a bwtsw reference.fa
where -a bwtsw
specifies that we want to use the indexing algorithm that is capable of handling the whole human genome.
-a index algorithm (bwtsw
for long genomes and is
for short genomes)
Hi! As I told you the last Wednesday, you can use STAR to make your index. I'm not sure whether STAR could perform this task for the maize but I leave you the general code to do it:
STAR --runThreadN #number of threads or cores \
--runMode genomeGenerate \
--genomeDir #Path in which you will store the index \
--genomeFastaFiles #Path to your FASTA file \
--sjdbGTFfile #Path to your GTF file \
--sjdb0verhang 99
Hope this tool will be helpful for your analysis.
Here is what I've been trying to run. index the Reference genome. Even though I've been running it in the lab's server, I have struggled since it is a very BIG genome.
when the indexed genome is done I'll try to uploaded here via weTransfer.
Here is the code:
##First clone the repository Deleterious-alleles-in-landraces-of-maize git clone https://github.com/Duhyadi/Deleterious-alleles-in-landraces-of-maize.git cd Deleterious-alleles-in-landraces-of-maize
#Then create a directory for the reference genome mkdir Maize cd Maize
##Download Reference genome Zea_mays.B73_RefGen_v4.dna_sm.toplevel.fa.gz
#Updated 5/30/19
# From ensembl/plants > Zea mays > Download DNA sequence (FASTA)
wget ftp://ftp.ensemblgenomes.org/pub/plants/release-44/fasta/zea_mays/dna/Zea_mays.B73_RefGen_v4.dna_sm.toplevel.fa.gz
##An easiest way to download such big files is using axel
wget http://wilmer.gaast.net/downloads/axel-1.0b.tar.gz
tar -zxvf axel-1.0b.tar.gz
cd axel-1.0b
./configure
make
make install
###Then download the Reference genome 40-60% faster
axel ftp://ftp.ensemblgenomes.org/pub/plants/release-44/fasta/zea_mays/dna/Zea_mays.B73_RefGen_v4.dna_sm.toplevel.fa.gz
##Download Annotation Zea_mays.B73_RefGen_v4.44.gff3.gz
#Updated 6/2/19
# From ensembl/plants > > Zea mays > Gene annotation/gff3
wget ftp://ftp.ensemblgenomes.org/pub/plants/release-44/gff3/zea_mays/Zea_mays.B73_RefGen_v4.44.gff3.gz
##Create output directory
mkdir Star_index
cd Star_index
STAR --runThreadN 18 --runMode genomeGenerate --genomeDir Maize/Star_Index --genomeFastaFiles Maize/Zea_mays.B73_RefGen_v4.dna_sm.toplevel.fa --sjdbGTFfile Maize/Zea_mays.B73_RefGen_v4.44.gff3 --sjdb0verhang 99
wget https://github.com/samtools/bcftools/releases/download/1.9/bcftools-1.9.tar.bz2
cd bcftools-1.9
./configure --prefix=../Maize
make install
cd ..
cd Arteaga_et_al_2016
cd Data
bgzip new_final_26_march.vcf
Hi again!
Looking at the code that Fer left for you, I've realized that the annotations are in GFF3 format. I'm sure that you have to use this argument --sjdbGTFtagExonParentTranscript
instead of --sjdbGTFfile
.
See you at the next class!
bcftools consensus -f ~/Documents/2020_1/Clase_Camille/Fernanda_genoma_indexado/Zea_mays.B73_RefGen_v4.dna_sm.toplevel.fa -s ~/Documents/2020_1/Clase_Camille/Mi_nuevo_repo/Arteaga_et_al_2016/Data/new_final_26_march.vcf.gz -o Fernanda.fa
Hi! In Samtools Manual we have realized that -s
needs a name, not a file. So, you can try this:
bcftools consensus -f ~/Documents/2020_1/Clase_Camille/Fernanda_genoma_indexado/Zea_mays.B73_RefGen_v4.dna_sm.toplevel.fa ~/Documents/2020_1/Clase_Camille/Mi_nuevo_repo/Arteaga_et_al_2016/Data/new_final_26_march.vcf.gz > Fernanda.fa
bgzipfile.vcf
and then tabix file.vcf.gz
.bcftools consensus -c -f ../Data/Fernanda_genoma_indexado/Zea_mays.B73_RefGen_v4.dna_sm.toplevel.fa ../Data/new_final_26_march.vcf.gz -o Fernanda_corn.fa
Only have a small sequence of fasta.
In here you will find the Question: Extract SNPs flanking sequences based on VCF and genome Fasta files. From BioStars.
I followed the instructions of biostars. Use Pysam. The files used were: the vcf file (contains SNPs) and the reference genome (fasta). However, I could not carry out the transformation. The following instructions were not clear to me:
Define by how many bases the variant should be flanked.
Iterate over each variant.
The mistake we can see here and here
I looked in the Pysam manual and I think that such a conversion cannot be done. I think ... I really don't know. For now I will stop trying to get analysis with BAD MUTATIONS.
Hi, Can you help me? I will thank you
My intention (at least for now) is to carry out the identification of deleterious alleles using a plant-specific software called BAD MUTATIONS, BLAST Aligned-Deleterious (BAD-M).
Introduction. It was not easy to install the dependencies. It was necessary to familiarize myself with python, ancaconda and bioconda. At first I thought my doubts would be about it. Fortunately I managed to move forward. Then you depend on them (installed and available in your $PATH or sys.path in Python):
To start the analysis, the following input files are required:
The FASTA input:
My doubts are specifically regarding the input files:
Just to play and prove that the package installed for BM works. It occurred to me to start a test analysis by converting the vcf file in Data_ Arteaga_et_al_2016 to fasta. I know that the analysis that BM would throw would be wrong since the vcf does not contain the complete sequences. That is, it does not contain the triplets, which are required to carry out the analysis. Despite this I thought it was important to do so. However, I have no clarity in the recommendations of convert “VCF to FASTA # 693” issue. Next the command:
I think the ideal would be to make an "imputation" with specialized software for it. But I'm not sure about it. Any recommendations?
For your attention thanks