KarchinLab / probabilistic2020

Simulates somatic mutations, and calls statistically significant oncogenes and tumor suppressor genes based on a randomization-based test
http://probabilistic2020.readthedocs.org
Apache License 2.0
8 stars 5 forks source link

Error in runing probabilistic2020 #3

Closed pradyumnasagar closed 7 years ago

pradyumnasagar commented 7 years ago

When i try to run probabilistic2020 with my data it ran into the following error

`Version: 1.0.7 Command: /usr/local/bin/probabilistic2020 oncogene -i 2020.fa -b 2020sort.bed -m 2020.maf -c 1.5 -p 10 -o oncogene_output_2020.txt Kept 1107 mutations after droping mutations with missing information (Droped: 0) Dropped 0 mutations after only keeping Missense_Mutation, Silent, Nonsense_Mutation, Splice_Site, Nonstop_Mutation, Translation_Start_Site. Indels are processed separately. Dropped 0 mutations after only keeping valid SNVs Pseudo Random Number Generator Seed: 101 Working on chromosome: chr1 . . . Working on chromosome: chr3 . . . Working on chromosome: chr2 . . . Working on chromosome: chr6 . . . Working on chromosome: chr19 . . . Working on chromosome: chr17 . . . Working on chromosome: chr7 . . . Working on chromosome: chr8 . . . Working on chromosome: chr9 . . . Working on chromosome: chr11 . . . Chasm context requires a three nucleotide string (Provided: "") Traceback (most recent call last): File "/usr/local/lib/python3.4/dist-packages/prob2020/python/utils.py", line 131, in wrapper result = f(*args, kwds) File "/usr/local/lib/python3.4/dist-packages/prob2020/console/randomization_test.py", line 51, in singleprocess_permutation sc = SequenceContext(gs, seed=opts['seed']) File "/usr/local/lib/python3.4/dist-packages/prob2020/python/sequence_context.py", line 12, in init self._init_context(gene_seq) File "/usr/local/lib/python3.4/dist-packages/prob2020/python/sequence_context.py", line 100, in _init_context first_context = prob2020.python.mutation_context.get_chasm_context(first_nucs) File "/usr/local/lib/python3.4/dist-packages/prob2020/python/mutation_context.py", line 138, in get_chasm_context '(Provided: "{0}")'.format(tri_nuc)) ValueError: Chasm context requires a three nucleotide string (Provided: "") Finished working on chromosome: chr3. Finished working on chromosome: chr19. Finished working on chromosome: chr17. Finished working on chromosome: chr6. Chasm context requires a three nucleotide string (Provided: "") Traceback (most recent call last): File "/usr/local/lib/python3.4/dist-packages/prob2020/python/utils.py", line 131, in wrapper result = f(*args, *kwds) File "/usr/local/lib/python3.4/dist-packages/prob2020/console/randomization_test.py", line 51, in singleprocess_permutation sc = SequenceContext(gs, seed=opts['seed']) File "/usr/local/lib/python3.4/dist-packages/prob2020/python/sequence_context.py", line 12, in init self._init_context(gene_seq) File "/usr/local/lib/python3.4/dist-packages/prob2020/python/sequence_context.py", line 100, in _init_context first_context = prob2020.python.mutation_context.get_chasm_context(first_nucs) File "/usr/local/lib/python3.4/dist-packages/prob2020/python/mutation_context.py", line 138, in get_chasm_context '(Provided: "{0}")'.format(tri_nuc)) ValueError: Chasm context requires a three nucleotide string (Provided: "") Finished working on chromosome: chr7. Finished working on chromosome: chr2. Finished working on chromosome: chr11. Finished working on chromosome: chr1. multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/usr/lib/python3.4/multiprocessing/pool.py", line 119, in worker result = (True, func(args, kwds)) File "/usr/local/lib/python3.4/dist-packages/prob2020/python/utils.py", line 131, in wrapper result = f(*args, **kwds) File "/usr/local/lib/python3.4/dist-packages/prob2020/console/randomization_test.py", line 51, in singleprocess_permutation sc = SequenceContext(gs, seed=opts['seed']) File "/usr/local/lib/python3.4/dist-packages/prob2020/python/sequence_context.py", line 12, in init self._init_context(gene_seq) File "/usr/local/lib/python3.4/dist-packages/prob2020/python/sequence_context.py", line 100, in _init_context first_context = prob2020.python.mutation_context.get_chasm_context(first_nucs) File "/usr/local/lib/python3.4/dist-packages/prob2020/python/mutation_context.py", line 138, in get_chasm_context '(Provided: "{0}")'.format(tri_nuc)) ValueError: Chasm context requires a three nucleotide string (Provided: "") """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/usr/local/bin/probabilistic2020", line 11, in sys.exit(cli_main()) File "/usr/local/lib/python3.4/dist-packages/prob2020/console/probabilistic2020.py", line 262, in cli_main main(opts) File "/usr/local/lib/python3.4/dist-packages/prob2020/console/probabilistic2020.py", line 210, in main result_df = rt.main(opts, mutation_df) File "/usr/local/lib/python3.4/dist-packages/prob2020/console/randomization_test.py", line 389, in main permutation_result = multiprocess_permutation(bed_dict, mut_df, opts) File "/usr/local/lib/python3.4/dist-packages/prob2020/console/randomization_test.py", line 152, in multiprocess_permutation for chrom_result in process_results: File "/usr/lib/python3.4/multiprocessing/pool.py", line 689, in next raise value ValueError: Chasm context requires a three nucleotide string (Provided: "") `

pradyumnasagar commented 7 years ago

Got the same error in sample data also.

ctokheim commented 7 years ago

Did the "tsg" command fail for you too?

ctokheim commented 7 years ago

This is kind of an odd error. I don't get this error when I run version 1.0.7 on the quick start example. Could you check the unit tests on the package to see if you get the same error message? The unit tests have been continuously tested on python 2.7 and 3.5, and have not shown an error. If you get an error on the unit tests, it might be an installation problem or at least help me debug what is happening.

This is how you would run the unit tests:

  1. Please uninstall existing probabilistic2020 package (pip uninstall probabilistic2020)
  2. Install the nose python package for running unit tests (pip install nose). Hopefully should be latest version 1.3.7.
  3. Download probabilistic2020 source code: wget https://github.com/KarchinLab/probabilistic2020/archive/v1.0.7.tar.gz
  4. Extract source code: tar xvzf v1.0.7.tar.gz ; cd probabilistic2020-1.0.7
  5. Build source code: make build
  6. Run unit tests: make tests
pradyumnasagar commented 7 years ago

yes even tsg command gave the same error "ValueError: Chasm context requires a three nucleotide string (Provided: "") "

pradyumnasagar commented 7 years ago

No errors were observed when I run the unit test but there were some warnings.

make tests nosetests --nologcapture tests/ .[fai_load] build FASTA index. /usr/local/lib/python2.7/dist-packages/pandas/core/indexing.py:296: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy self.obj[key] = _infer_fill_value(value) /usr/local/lib/python2.7/dist-packages/pandas/core/indexing.py:476: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy self.obj[item] = s .[fai_load] build FASTA index. /home/mlscl3/2020/probabilistic2020-1.0.7/prob2020/python/count_frameshifts.py:47: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy gene_df['unmapped'] = [(1 if x is None else 0) for x in fs_pos] ..[fai_load] build FASTA index. .[fai_load] build FASTA index. ..[fai_load] build FASTA index. /home/mlscl3/2020/probabilistic2020-1.0.7/prob2020/python/p_value.py:22: RuntimeWarning: divide by zero encountered in log chisq_stat = np.sum(-2*np.log(pvals)) .......[fai_load] build FASTA index. /home/mlscl3/2020/probabilistic2020-1.0.7/prob2020/python/indel.py:177: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy mut_df['indel type'] = '' /home/mlscl3/2020/probabilistic2020-1.0.7/prob2020/python/mutation_context.py:83: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy mut_info['Coding Position'] = pos_list /home/mlscl3/2020/probabilistic2020-1.0.7/prob2020/python/mymath.py:19: RuntimeWarning: divide by zero encountered in log2 return -np.sum(np.where(p!=0, p np.log2(p), 0)) /home/mlscl3/2020/probabilistic2020-1.0.7/prob2020/python/mymath.py:19: RuntimeWarning: invalid value encountered in multiply return -np.sum(np.where(p!=0, p np.log2(p), 0)) .

Ran 15 tests in 267.090s

OK

ctokheim commented 7 years ago

So you ran the quick start command verbatim? Do you get the error when you replace the quick start example "probabilistic2020 tsg" with "prob2020/console/probabilistic2020.py tsg" (from the source code you download)?

pradyumnasagar commented 7 years ago

yes it has the same error when I try to run from source with my data.

prob2020/console/probabilistic2020.py tsg -i 2020.fa -b 2020.bed -m 2020.maf -o 2020testout.txt

Version: 1.0.7 Command: prob2020/console/probabilistic2020.py tsg -i 2020.fa -b 2020.bed -m 2020.maf -o 2020testout.txt Kept 1107 mutations after droping mutations with missing information (Droped: 0) Dropped 33 mutations after only keeping Missense_Mutation, Silent, Nonsense_Mutation, Splice_Site, Nonstop_Mutation, Translation_Start_Site. Indels are processed separately. Dropped 0 mutations after only keeping valid SNVs Pseudo Random Number Generator Seed: 101 Working on chromosome: chr1 . . . Finished working on chromosome: chr1. Working on chromosome: chr3 . . . Finished working on chromosome: chr3. Working on chromosome: chr2 . . . Finished working on chromosome: chr2. Working on chromosome: chr6 . . . Finished working on chromosome: chr6. Working on chromosome: chr19 . . . Finished working on chromosome: chr19. Working on chromosome: chr17 . . . Finished working on chromosome: chr17. Working on chromosome: chr7 . . . Finished working on chromosome: chr7. Working on chromosome: chr8 . . . Chasm context requires a three nucleotide string (Provided: "") Traceback (most recent call last): File "/home/mlscl3/2020/probabilistic2020-1.0.7/prob2020/console/../../prob2020/python/utils.py", line 131, in wrapper result = f(*args, *kwds) File "/home/mlscl3/2020/probabilistic2020-1.0.7/prob2020/console/../../prob2020/console/randomization_test.py", line 51, in singleprocess_permutation sc = SequenceContext(gs, seed=opts['seed']) File "/home/mlscl3/2020/probabilistic2020-1.0.7/prob2020/console/../../prob2020/python/sequence_context.py", line 12, in init self._init_context(gene_seq) File "/home/mlscl3/2020/probabilistic2020-1.0.7/prob2020/console/../../prob2020/python/sequence_context.py", line 100, in _init_context first_context = prob2020.python.mutation_context.get_chasm_context(first_nucs) File "/home/mlscl3/2020/probabilistic2020-1.0.7/prob2020/console/../../prob2020/python/mutation_context.py", line 138, in get_chasm_context '(Provided: "{0}")'.format(tri_nuc)) ValueError: Chasm context requires a three nucleotide string (Provided: "") Traceback (most recent call last): File "prob2020/console/probabilistic2020.py", line 266, in cli_main() File "prob2020/console/probabilistic2020.py", line 262, in cli_main main(opts) File "prob2020/console/probabilistic2020.py", line 210, in main result_df = rt.main(opts, mutation_df) File "/home/mlscl3/2020/probabilistic2020-1.0.7/prob2020/console/../../prob2020/console/randomization_test.py", line 395, in main frameshift_df, p_inactivating) File "/home/mlscl3/2020/probabilistic2020-1.0.7/prob2020/console/../../prob2020/console/randomization_test.py", line 163, in multiprocess_permutation result_list += singleprocess_permutation(info) File "/home/mlscl3/2020/probabilistic2020-1.0.7/prob2020/console/../../prob2020/python/utils.py", line 131, in wrapper result = f(args, **kwds) File "/home/mlscl3/2020/probabilistic2020-1.0.7/prob2020/console/../../prob2020/console/randomization_test.py", line 51, in singleprocess_permutation sc = SequenceContext(gs, seed=opts['seed']) File "/home/mlscl3/2020/probabilistic2020-1.0.7/prob2020/console/../../prob2020/python/sequence_context.py", line 12, in init self._init_context(gene_seq) File "/home/mlscl3/2020/probabilistic2020-1.0.7/prob2020/console/../../prob2020/python/sequence_context.py", line 100, in _init_context first_context = prob2020.python.mutation_context.get_chasm_context(first_nucs) File "/home/mlscl3/2020/probabilistic2020-1.0.7/prob2020/console/../../prob2020/python/mutation_context.py", line 138, in get_chasm_context '(Provided: "{0}")'.format(tri_nuc)) ValueError: Chasm context requires a three nucleotide string (Provided: "")

ctokheim commented 7 years ago

I'm guessing 2020.fa, 2020.bed, and 2020.maf are your own data. Can you run the command on the quick start example data?

pradyumnasagar commented 7 years ago

thanks for the help, it ran perfectly with quick start example data, changed my data format accordingly and now it is working fine

ctokheim commented 7 years ago

Could you tell me what the data format problem was? I might be able to update the documentation or modify the code to give a more informative error message.