KarchinLab / probabilistic2020

Simulates somatic mutations, and calls statistically significant oncogenes and tumor suppressor genes based on a randomization-based test
http://probabilistic2020.readthedocs.org
Apache License 2.0
8 stars 5 forks source link

Installation error in a new conda environment #9

Closed orochimarupap closed 5 years ago

orochimarupap commented 5 years ago

I have opened a clean environment of python 3.6 though anaconda and when I try to install probabilistic2020 using "conda install -c biobuilds probabilistic2020" terminal throws this error . . . Solving environment: failed

UnsatisfiableError: The following specifications were found to be incompatible with each other:

ctokheim commented 5 years ago

Hi. probabilistic2020 is currently not supported to install via the conda command. Can you try "pip install probabilistic2020"?

orochimarupap commented 5 years ago

Installing with pip worked great, thank you. I do have another question and hope it is okay to ask here.

In an attempt to create a gene sequence FASTA I am running into issues when I call the command:

extract_gene_seq -i hg19.fa -b snvboxGenes.bed -o snvboxGenes.fa

This error is thrown:

Traceback (most recent call last): File "/anaconda3/envs/2020plus/bin/extract_gene_seq", line 10, in sys.exit(cli_main()) File "/anaconda3/envs/2020plus/lib/python3.6/site-packages/prob2020/console/extract_gene_seq.py", line 91, in cli_main main(opts) File "/anaconda3/envs/2020plus/lib/python3.6/site-packages/prob2020/console/extract_gene_seq.py", line 81, in main genome_fa = pysam.Fastafile(opts['input']) File "pysam/libcfaidx.pyx", line 123, in pysam.libcfaidx.FastaFile.cinit File "pysam/libcfaidx.pyx", line 155, in pysam.libcfaidx.FastaFile._open OSError: file hg19.fa not found

The folder containing hg19.fa is in my PATH and I can call on the file in terminal but it seems like when I call for the file in the extract_gene_sequences function I run into problems. Why can't the file be found?

orochimarupap commented 5 years ago

Similarly, if I set the folder containing the file as my current working directory this error is thrown:

[E::fai_build_core] Format error, unexpected "C" at line 1 Traceback (most recent call last): File "/anaconda3/envs/2020plus/bin/extract_gene_seq", line 10, in sys.exit(cli_main()) File "/anaconda3/envs/2020plus/lib/python3.6/site-packages/prob2020/console/extract_gene_seq.py", line 91, in cli_main main(opts) File "/anaconda3/envs/2020plus/lib/python3.6/site-packages/prob2020/console/extract_gene_seq.py", line 81, in main genome_fa = pysam.Fastafile(opts['input']) File "pysam/libcfaidx.pyx", line 123, in pysam.libcfaidx.FastaFile.cinit File "pysam/libcfaidx.pyx", line 183, in pysam.libcfaidx.FastaFile._open OSError: error when opening file hg19.fa

ctokheim commented 5 years ago

Can you specify the full path to the file? Also check the contents of hg19.fa to ensure it is not truncated or empty.

orochimarupap commented 5 years ago

I can specify the path of the hg19.fa file during the extract_gene_sequence function but will still result with OSError: error when opening file 'hg19.fa'

When opening hg19.fa in a text editor the editor program crashes, could there be a problem with the hg19.fa file?

ctokheim commented 5 years ago

It's possible you don't have a complete .fa file. Could you check the file size?

ctokheim commented 5 years ago

Or if you are on linux/mac type "more hg19.fa" on the command line to see the top of the file

orochimarupap commented 5 years ago

The file size stands at 816.2MB "more hg19.fa" results in a combination of data and some artifacts, is this how the file should read?:

C'A^Z^@^@^@^@]^@^@^@^@^@^@^@^Dchr187>^F^@^@^Dchr2<B7<85>^C^Dchr3

^G^Dchr4 ^Dchr5d<97>^M^Dchr6<91>^?^P^Dchr7,'^S^DchrX<9F>^U^Dchr8<9C> ^P^G^X^Dchr92"M^Z^Echr10y|^\^Echr11'<83><97>^^^Echr12,۰ ^Echr13B{"^Echr14p)<8D>$^Echr15^Q]5&^Echr16n'^Echr17:<91>2)^Echr18mv*^Echr20Ū+^DchrY|,^Echr19 <8E>-^Echr22<9F>m{.^Echr21%E/^Nchr6_ssto_hap7 <8F>^G^C0^Mchr6_mcf_hap5ux^V0^Mchr6_cox_hap2S<80>)0^Nchr6_mann_hap4^V<8C><0^Mchr6_apd_hap1n^LO0^Mchr6_qbl_hap6s^La0^Mchr6_dbb_hap3/Rs0^Ochr17_ctg5_hap1<86><8F> <85>0^Nchr4_ctg9_hap1^]M<8C>0^Tchr1_gl000192_random<8E>0^NchrUn_gl0002256Ӑ0^Tchr4_gl000194_random<91>0^Tchr4_gl000193_random^\f<92>0^Tchr9_gl000200_randomT'<93>0^NchrUn_gl000222<93>0^NchrUn_gl000212<94>0^Tchr7_gl000195_random`<95>0^NchrUn_gl0002236^Y<96>0^NchrUn_gl000224і0^NchrUn_gl000219t<85><97>0^Uchr17_gl000205_random9<98>0^NchrUn_gl000215{<98>0^NchrUn_gl000216D<97><99>0^NchrUn_gl000217B<9A>0^Tchr9_gl000199_random<9A>0^NchrUn_gl0002119<99><9B>0^NchrUn_gl0002133B<9C>0^NchrUn_gl000220 <9C>0^NchrUn_gl000218<9A><8D><9D>0^Uchr19_gl000209_random/<9E>0^NchrUn_gl000221<82>О0^NchrUn_gl000214^Tm<9F>0^NchrUn_gl000228R<9F>0^NchrUn_gl000227{0^Tchr1_gl000191_random^@0^Uchr19_gl000208_random1l0^Tchr9_gl000198_random.ǡ0^Uchr17_gl000204_random 0^NchrUn_gl000233pr0^NchrUn_gl000237V0^NchrUn_gl000230Т0^NchrUn_gl0002420^NchrUn_gl000243U*0^NchrUn_gl000241YV0^NchrUn_gl000236C<81>0^NchrUn_gl0002400^Uchr17_gl000206_random^S֣0^NchrUn_gl000232n0^NchrUn_gl000234Y(0^Uchr11_gl000202_randomFQ0^NchrUn_gl000238 z0^NchrUn_gl000244i0^NchrUhg19.fa
ctokheim commented 5 years ago

It looks like you did not uncompress the original hg19.2bit file. Please see the twoBitToFa command from ucsc genome browser: https://genome.ucsc.edu/goldenpath/help/twoBit.html .