churchill-lab / g2gtools

Personal diploid genome creation and coordinate conversion
http://churchill-lab.github.io/g2gtools
21 stars 9 forks source link

Location Error and Missing vcf2vci function #22

Closed exsquire closed 4 years ago

exsquire commented 4 years ago

Hello,

I am trying to use g2gtools to construct the 8 founder transcriptomes from the DO. However, I am running into trouble with incorporating the indels into the snp-patched genomes.

I am using the following inputs from Sanger: ftp://ftp-mouse.sanger.ac.uk/

REF=inputs/GRCm38_68.fa
GTF=inputs/Mus_musculus.GRCm38.68.gtf
INDELS=inputs/mgp.v3.indels.rsIDdbSNPv137.vcf.gz 
SNPS=inputs/mgp.v3.snps.rsIDdbSNPv137.vcf.gz
STRAINS="AJ 129S1 NODShiLtJ NZOHlLtJ C57BL6NJ CASTEiJ PWKPhJ WSBEiJ"

The following steps seem to work without issue:

g2gtools vcf2chain -f ${REF} -i ${INDELS} -s ${STRAIN} -o ${STRAIN}/REF-to-${STRAIN}.chain
g2gtools patch -i ${REF} -s ${STRAIN} -v ${SNPS} -o ${STRAIN}/${STRAIN}.patched.fa

But I get a "Location Error" message for this command:

g2gtools transform -i ${STRAIN}/${STRAIN}.patched.fa -c ${STRAIN}/REF-to-${STRAIN}.chain -o ${STRAIN}/${STRAIN}.fa

While attempting to resolve the issue, I noticed that I was following instructions from the first version of documentation: https://g2gtools.readthedocs.io/en/latest/usage.html#to-use-g2gtools-in-command-line

But when I went to the second version and attempted to run this feature: http://churchill-lab.github.io/g2gtools/#features/1

I receive this message:

(g2gtools) esque@farm:~/gbrs$ g2gtools vcf2vci --help
Unrecognized command

Please let me know if there is any further information that I can provide. My goal is to make it to the transcriptome extraction section. Any suggestions would be appreciated.

exsquire commented 4 years ago

Edit: Amended the last code chunk where I had gbrs activated - reproduced error in g2gtools environment.

kbchoi-jax commented 4 years ago

There are the versions I made if you don’t have to stick to Release 68 annotation or those old Sanger variants. Let me know.

ftp://churchill-lab.jax.org/software/g2gtools/mouse/R84-REL1505/

exsquire commented 4 years ago

That's certainly helpful, thank you. Perhaps you are referring to the files at this link? ftp://churchill-lab.jax.org/pub/software/GBRS/R84-REL1505/, which is defunct on my end (link taken from: https://gbrs.readthedocs.io/en/latest/installation.html).

Also, I just noticed that there is a location parameter in the 'transform' help page. I'm thinking this is causing the location error, however, none of the example code I've seen references the -l parameter.

Final question:

With the data you referred me to, my plan would be to get the founder transcripts together and then form my pooled transcriptome:

cat *.transcripts.fa > pooled_transcriptome.fa

Would this be sufficient to enter the gbrs pipeline at:

bowtie -q -a --best --strata --sam -v 3 ${GBRS_DIR}/bowtie.transcriptome ${FASTQ} \
    | samtools view -bS - > ${BAM_FILE}
kbchoi-jax commented 4 years ago

I thought you wanted the eight founders. Anyway in the ftp site, there is a file gbrs.hybridized.targets.fa.gz which is the pooled transcriptome fasta file. Its bowtie index file is gbrs.hybridized.targets.bowtie-index.tar.gz. You need to untar the file at your ${GBRS_DIR}.

exsquire commented 4 years ago

Thank you for your help!