gimelbrantlab / ASEReadCounter_star

Preprocessing sequencing data for allele-specific analysis
GNU General Public License v3.0
11 stars 5 forks source link

genome selection #2

Open Hemantcnaik opened 4 years ago

Hemantcnaik commented 4 years ago

Hello, I am trying run your pipeline on single cell analysis, i have genome C57(maternal) CAST(Paternal) and CAST(maternal) C57(paternal) in this case how to use the command python3 python3/prepare_reference.py, i have SNP file downloaded from mouse genome project (CAST snp file)

Strausyatina commented 4 years ago

Hello, Looks like your case is Inbred lines: Separate line(s) vcf with one of them being the reference line itself. Please have a look at the tables in the corresponding sections Pseudoreference fasta creation and Heterozygous(parental) VCF creation for the parameters you need. Please also note the footnote "[1] If one or the alleles in case of inbred lines is reference, then everything should be provided as mat or pat only, consistently." That implies that in the first case you would need to provide your CAST snp file as --vcf_pat, and in the second as --vcf_mat.

Hemantcnaik commented 4 years ago

Hello thank you for your reply i have tried below command but i am getting error, can u please help me

command used python3 ASEReadCounter_star-master/python3/prepare_reference.py --PSEUDOREF True --HETVCF True \ --pseudoref_dir pseudo \ --vcf_dir vcf \ --ref Ref/Mus_musculus.GRCm38.68.dna.toplevel.fa \ --name_pat CAST_EiJ \ --vcf_pat CAST_EiJ.mgp.v5.snps.dbSNP142.vcf.gz \ --gtf Ref/Mus_musculus.GRCm38.99.chr.gtf

erorr I am getting /File "ASEReadCounter_star-master/python3/prepare_reference.py", line 718, in main() File "ASEReadCounter_star-master/python3/prepare_reference.py", line 603, in main GATK_SelectVariants(r=args.ref, v=args.vcf_mat, o=snp_alt_vcf.name, b=False) File "ASEReadCounter_star-master/python3/prepare_reference.py", line 79, in GATK_SelectVariants for item in flags[f]: TypeError: 'NoneType' object is not iterable

Strausyatina commented 4 years ago

Sorry for that, the freshly committed version, with this bug fixed, should work now.

One more thing, the procedure of creating a heterozygous VCF follows the rules below, in case of Inbred lines:

So you don't need to repeat this operation twice, heterozygous VCF will be the same. You should just remember which parent was actual or set reference, and which is alternative (for C57(maternal) vs CAST(Paternal): maternal is reference in the resulting count tables at the end of the pipeline, and paternal is alternative; reverse for CAST(maternal) vs C57(paternal)).

This therefore means that in python2/allelecounter.py function the corresponding reference --ref should be used: either maternal pseudoreference or reference genome itself respectively.

I should add this clarification to the wiki, thank you!

Please let me know if you have any other questions.

Asia

Hemantcnaik commented 4 years ago

Thank you for your clarification