Biobix / proteoformer

Integration ribosome profiling - mass spectrometry
GNU General Public License v3.0
14 stars 7 forks source link

Proteoformer "Hangs" on Convert to assembly table features during PRICE.py #1

Closed schaughencyp closed 5 years ago

schaughencyp commented 5 years ago

Hello,

I am trying to test proteoformer on some Riboseq data that I have. It seems like the PRICE.py is hanging at the Convert to assembly table features. I mean, that is unless it is supposed to take longer the 36 hours to run that step.

Any help would be appreciated.

Thanks,

Paul Schaughency

Here is the log file:

##########################

ORF Calling with PRICE

##########################

Info about PRICE: https://github.com/erhard-lab/gedi/wiki/Price

Parameters: tmp /gpfs/gsfs11/users/NCBR/projects/NCBR-28/proteoformer/test/HDD1_Ribo_seq_S1/tmp/ result_db SQLite/results_ribo.db workdir /gpfs/gsfs11/users/NCBR/projects/NCBR-28/proteoformer/test/HDD1_Ribo_seq_S1 max_ram 0 fdr 0.1 Parameters from arguments table: mapper STAR uniq Y bam_tr /gpfs/gsfs11/users/NCBR/projects/NCBR-28/proteoformer/test/HDD1_Ribo_seq_S1/STAR/fastq2/price_treat.bam readtype ribo species human bam_untr /gpfs/gsfs11/users/NCBR/projects/NCBR-28/proteoformer/test/HDD1_Ribo_seq_S1/STAR/fastq1/price_untreat.bam tr_calling_method rule_based run_name HDD1_Ribo_seq_S1 cores 16 ensdb /data/NCBR/projects/NCBR-28/proteoformer/ref/ENS_hsa_96.db igenomes_root /data/NCBR/projects/NCBR-28/proteoformer/ref/igenomes ens_v 96

Get chromosomes

Binary chromosomes already exist

[0:01:36]: Install PRICE [0:01:38]: COMPLETE: PRICE installation

[0:01:38]: Prepare PRICE genome reference 2019-04-23 17:22:25.098 INFO Command: gedi -e IndexGenome -s /data/NCBR/projects/NCBR-28/proteoformer/ref/igenomes/Homo_sapiens/Ensembl/GRCh38/Sequence/WholeGenomeFasta/genome.fa -a /data/NCBR/projects/NCBR-28/proteoformer/ref/igenomes/Homo_sapiens/Ensembl/GRCh38/Annotation/Genes/genes_96.gtf -n homo_sapiens_96 -nobowtie -nostar -nokallisto 2019-04-23 17:22:25.441 INFO Discovering classes in classpath 2019-04-23 17:22:25.616 INFO Preparing simple class references 2019-04-23 17:22:25.744 INFO Gedi 1.0.2 (JAR) startup Written genomic info to /home/schaughencypm/.gedi/genomic/homo_sapiens_96.oml 2019-04-23 17:22:26.364 INFO Finished: gedi -e IndexGenome -s /data/NCBR/projects/NCBR-28/proteoformer/ref/igenomes/Homo_sapiens/Ensembl/GRCh38/Sequence/WholeGenomeFasta/genome.fa -a /data/NCBR/projects/NCBR-28/proteoformer/ref/igenomes/Homo_sapiens/Ensembl/GRCh38/Annotation/Genes/genes_96.gtf -n homo_sapiens_96 -nobowtie -nostar -nokallisto /gpfs/gsfs11/users/NCBR/projects/NCBR-28/proteoformer/test/HDD1_Ribo_seq_S1/tmp/PRICE//Price_1.0.2/gedi -e IndexGenome -s /data/NCBR/projects/NCBR-28/proteoformer/ref/igenomes/Homo_sapiens/Ensembl/GRCh38/Sequence/WholeGenomeFasta/genome.fa -a /data/NCBR/projects/NCBR-28/proteoformer/ref/igenomes/Homo_sapiens/Ensembl/GRCh38/Annotation/Genes/genes_96.gtf -n homo_sapiens_96 -nobowtie -nostar -nokallisto [0:01:40]: COMPLETE: PRICE genome reference

[0:01:40]: Run PRICE 2019-04-23 17:22:26.698 INFO Command: gedi -e Price -reads /gpfs/gsfs11/users/NCBR/projects/NCBR-28/proteoformer/test/HDD1_Ribo_seq_S1/tmp/PRICE//reads.bamlist -genomic homo_sapiens_96 -prefix /gpfs/gsfs11/users/NCBR/projects/NCBR-28/proteoformer/test/HDD1_Ribo_seq_S1/tmp/PRICE/output -nthreads 16 -fdr 0.1 -percond 2019-04-23 17:22:26.870 INFO Discovering classes in classpath 2019-04-23 17:22:26.947 INFO Preparing simple class references 2019-04-23 17:22:27.022 INFO Gedi 1.0.2 (JAR) startup 2019-04-23 17:22:27.229 INFO Reading oml /home/schaughencypm/.gedi/genomic/homo_sapiens_96.oml 2019-04-23 17:22:27.239 INFO Done reading oml /home/schaughencypm/.gedi/genomic/homo_sapiens_96.oml 2019-04-23 17:22:28.550 INFO Estimating maxpos... 2019-04-23 17:22:28.559 INFO Clustering reads 2019-04-23 17:29:06.346 INFO Found 393032 clusters without looking at annotation 2019-04-23 17:29:08.455 INFO Found 142750 clusters after looking at annotation 2019-04-23 17:29:08.605 INFO Found 29137 clusters after filtering with read/region count 2019-04-23 17:30:37.067 INFO Using maxpos=12 (0) 2019-04-23 17:30:37.068 INFO Estimate parameters 2019-04-23 17:30:55.131 INFO LL=-1.01674e+08 2019-04-23 17:30:55.145 INFO Using maxpos=12 (1) 2019-04-23 17:30:55.145 INFO Estimate parameters Correcting from 21 to 12 (desired: 12) 2019-04-23 17:31:21.779 INFO LL=-1.51509e+07 2019-04-23 17:31:21.800 INFO Codon inference 2019-04-23 17:38:51.129 INFO Calibrate noise model 2019-04-23 17:38:51.129 INFO Train start prediction 2019-04-23 17:38:51.129 INFO Writing codon index 2019-04-23 17:40:01.105 INFO Considering 10827 ORFs for start training [GediProgram-1-thread-19] INFO smile.math.Math - L-BFGS: initial function value: 4158.9 [GediProgram-1-thread-19] INFO smile.math.Math - L-BFGS: the function value after 10 iterations: 2292.6 [GediProgram-1-thread-19] INFO smile.math.Math - L-BFGS: the function value after 13 iterations: 2292.3 [GediProgram-1-thread-19] INFO smile.math.Math - L-BFGS: initial function value: 4158.9 [GediProgram-1-thread-19] INFO smile.math.Math - L-BFGS: the function value after 10 iterations: 2411.3 [GediProgram-1-thread-19] INFO smile.math.Math - L-BFGS: the function value after 13 iterations: 2411.3 [GediProgram-1-thread-19] INFO smile.math.Math - L-BFGS: initial function value: 4158.9 [GediProgram-1-thread-19] INFO smile.math.Math - L-BFGS: the function value after 10 iterations: 1960.9 [GediProgram-1-thread-19] INFO smile.math.Math - L-BFGS: the function value after 13 iterations: 1960.8 [GediProgram-1-thread-19] INFO smile.math.Math - L-BFGS: initial function value: 4158.9 [GediProgram-1-thread-19] INFO smile.math.Math - L-BFGS: the function value after 10 iterations: 2151.2 [GediProgram-1-thread-19] INFO smile.math.Math - L-BFGS: the function value after 14 iterations: 2149.8 [GediProgram-1-thread-19] INFO smile.math.Math - L-BFGS: initial function value: 4158.9 [GediProgram-1-thread-19] INFO smile.math.Math - L-BFGS: the function value after 10 iterations: 1678.9 [GediProgram-1-thread-19] INFO smile.math.Math - L-BFGS: the function value after 14 iterations: 1678.9 [GediProgram-1-thread-19] INFO smile.math.Math - L-BFGS: initial function value: 4158.9 [GediProgram-1-thread-19] INFO smile.math.Math - L-BFGS: the function value after 10 iterations: 1969.8 [GediProgram-1-thread-19] INFO smile.math.Math - L-BFGS: the function value after 13 iterations: 1969.7 [GediProgram-1-thread-19] INFO smile.math.Math - L-BFGS: initial function value: 4158.9 [GediProgram-1-thread-19] INFO smile.math.Math - L-BFGS: the function value after 10 iterations: 1509.0 [GediProgram-1-thread-19] INFO smile.math.Math - L-BFGS: the function value after 13 iterations: 1508.9 [GediProgram-1-thread-19] INFO smile.math.Math - L-BFGS: initial function value: 4158.9 [GediProgram-1-thread-19] INFO smile.math.Math - L-BFGS: the function value after 10 iterations: 1858.9 [GediProgram-1-thread-19] INFO smile.math.Math - L-BFGS: the function value after 15 iterations: 1857.4 [GediProgram-1-thread-19] INFO smile.math.Math - L-BFGS: initial function value: 4158.9 [GediProgram-1-thread-19] INFO smile.math.Math - L-BFGS: the function value after 10 iterations: 1459.2 [GediProgram-1-thread-19] INFO smile.math.Math - L-BFGS: the function value after 14 iterations: 1458.9 [GediProgram-1-thread-19] INFO smile.math.Math - L-BFGS: initial function value: 4158.9 [GediProgram-1-thread-19] INFO smile.math.Math - L-BFGS: the function value after 10 iterations: 1778.0 [GediProgram-1-thread-19] INFO smile.math.Math - L-BFGS: the function value after 14 iterations: 1776.5 [GediProgram-1-thread-19] INFO smile.math.Math - L-BFGS: initial function value: 4158.9 [GediProgram-1-thread-19] INFO smile.math.Math - L-BFGS: the function value after 10 iterations: 1335.4 [GediProgram-1-thread-19] INFO smile.math.Math - L-BFGS: the function value after 14 iterations: 1335.1 [GediProgram-1-thread-19] INFO smile.math.Math - L-BFGS: initial function value: 4158.9 [GediProgram-1-thread-19] INFO smile.math.Math - L-BFGS: the function value after 10 iterations: 1615.8 [GediProgram-1-thread-19] INFO smile.math.Math - L-BFGS: the function value after 14 iterations: 1615.4 [GediProgram-1-thread-19] INFO smile.math.Math - L-BFGS: initial function value: 4158.9 [GediProgram-1-thread-19] INFO smile.math.Math - L-BFGS: the function value after 10 iterations: 1265.4 [GediProgram-1-thread-19] INFO smile.math.Math - L-BFGS: the function value after 14 iterations: 1264.5 [GediProgram-1-thread-19] INFO smile.math.Math - L-BFGS: initial function value: 4158.9 [GediProgram-1-thread-19] INFO smile.math.Math - L-BFGS: the function value after 10 iterations: 1532.7 [GediProgram-1-thread-19] INFO smile.math.Math - L-BFGS: the function value after 14 iterations: 1530.7 [GediProgram-1-thread-19] INFO smile.math.Math - L-BFGS: initial function value: 4158.9 [GediProgram-1-thread-19] INFO smile.math.Math - L-BFGS: the function value after 10 iterations: 1179.3 [GediProgram-1-thread-19] INFO smile.math.Math - L-BFGS: the function value after 15 iterations: 1177.9 [GediProgram-1-thread-19] INFO smile.math.Math - L-BFGS: initial function value: 4158.9 [GediProgram-1-thread-19] INFO smile.math.Math - L-BFGS: the function value after 10 iterations: 1527.8 [GediProgram-1-thread-19] INFO smile.math.Math - L-BFGS: the function value after 15 iterations: 1525.7 [GediProgram-1-thread-19] INFO smile.math.Math - L-BFGS: initial function value: 4158.9 [GediProgram-1-thread-19] INFO smile.math.Math - L-BFGS: the function value after 10 iterations: 1119.4 [GediProgram-1-thread-19] INFO smile.math.Math - L-BFGS: the function value after 15 iterations: 1116.6 [GediProgram-1-thread-19] INFO smile.math.Math - L-BFGS: initial function value: 4158.9 [GediProgram-1-thread-19] INFO smile.math.Math - L-BFGS: the function value after 10 iterations: 1431.1 [GediProgram-1-thread-19] INFO smile.math.Math - L-BFGS: the function value after 15 iterations: 1427.8 [GediProgram-1-thread-19] INFO smile.math.Math - L-BFGS: initial function value: 7598.3 [GediProgram-1-thread-19] INFO smile.math.Math - L-BFGS: the function value after 10 iterations: 1907.3 [GediProgram-1-thread-19] INFO smile.math.Math - L-BFGS: the function value after 18 iterations: 1890.7 [GediProgram-1-thread-19] INFO smile.math.Math - L-BFGS: initial function value: 7598.3 [GediProgram-1-thread-19] INFO smile.math.Math - L-BFGS: the function value after 10 iterations: 2758.1 [GediProgram-1-thread-19] INFO smile.math.Math - L-BFGS: the function value after 14 iterations: 2757.8 2019-04-23 17:40:02.588 INFO Infer ORFs 2019-04-23 17:40:39.031 INFO Writing viewer index 2019-04-23 17:50:11.076 INFO Found 81857 ORFs 2019-04-23 17:50:11.089 INFO Multiple testing correction and filtering 2019-04-23 17:50:24.883 INFO Remaining after multiple testing correction: 26786 ORFs 2019-04-23 17:50:24.911 INFO Reassign codons 2019-04-23 17:50:55.417 INFO Removing temp file /gpfs/gsfs11/users/NCBR/projects/NCBR-28/proteoformer/test/HDD1_Ribo_seq_S1/tmp/PRICE/output.clusters.cit 2019-04-23 17:50:55.418 INFO Removing temp file /gpfs/gsfs11/users/NCBR/projects/NCBR-28/proteoformer/test/HDD1_Ribo_seq_S1/tmp/PRICE/output.codons.bin 2019-04-23 17:50:55.418 INFO Removing temp file /gpfs/gsfs11/users/NCBR/projects/NCBR-28/proteoformer/test/HDD1_Ribo_seq_S1/tmp/PRICE/output.start.model 2019-04-23 17:50:55.419 INFO Removing temp file /gpfs/gsfs11/users/NCBR/projects/NCBR-28/proteoformer/test/HDD1_Ribo_seq_S1/tmp/PRICE/output.noise.model 2019-04-23 17:50:55.424 INFO Removing temp file /gpfs/gsfs11/users/NCBR/projects/NCBR-28/proteoformer/test/HDD1_Ribo_seq_S1/tmp/PRICE/output.orfs.pvals 2019-04-23 17:50:55.426 INFO Removing temp file /gpfs/gsfs11/users/NCBR/projects/NCBR-28/proteoformer/test/HDD1_Ribo_seq_S1/tmp/PRICE/output.orfs.bin 2019-04-23 17:50:55.469 INFO Removing temp file /gpfs/gsfs11/users/NCBR/projects/NCBR-28/proteoformer/test/HDD1_Ribo_seq_S1/tmp/PRICE/output.tmp.orfs.cit 2019-04-23 17:50:55.472 INFO Finished: gedi -e Price -reads /gpfs/gsfs11/users/NCBR/projects/NCBR-28/proteoformer/test/HDD1_Ribo_seq_S1/tmp/PRICE//reads.bamlist -genomic homo_sapiens_96 -prefix /gpfs/gsfs11/users/NCBR/projects/NCBR-28/proteoformer/test/HDD1_Ribo_seq_S1/tmp/PRICE/output -nthreads 16 -fdr 0.1 -percond /gpfs/gsfs11/users/NCBR/projects/NCBR-28/proteoformer/test/HDD1_Ribo_seq_S1/tmp/PRICE//Price_1.0.2/gedi -Dsmile.threads=1 -e Price -reads /gpfs/gsfs11/users/NCBR/projects/NCBR-28/proteoformer/test/HDD1_Ribo_seq_S1/tmp/PRICE//reads.bamlist -genomic homo_sapiens_96 -prefix /gpfs/gsfs11/users/NCBR/projects/NCBR-28/proteoformer/test/HDD1_Ribo_seq_S1/tmp/PRICE/output -nthreads 16 -fdr 0.1 -percond

[0:30:09]: COMPLETE: PRICE run

[0:30:09]: Convert CIT file 2019-04-23 17:50:55.933 INFO Command: gedi -e ViewCIT -o /gpfs/gsfs11/users/NCBR/projects/NCBR-28/proteoformer/test/HDD1_Ribo_seq_S1/tmp/PRICE/output.orfs.loc -m Location /gpfs/gsfs11/users/NCBR/projects/NCBR-28/proteoformer/test/HDD1_Ribo_seq_S1/tmp/PRICE/output.orfs.cit 2019-04-23 17:50:56.877 INFO Discovering classes in classpath 2019-04-23 17:50:57.017 INFO Preparing simple class references 2019-04-23 17:50:57.111 INFO Gedi 1.0.2 (JAR) startup /gpfs/gsfs11/users/NCBR/projects/NCBR-28/proteoformer/test/HDD1_Ribo_seq_S1/tmp/PRICE//Price_1.0.2/gedi -e ViewCIT -o /gpfs/gsfs11/users/NCBR/projects/NCBR-28/proteoformer/test/HDD1_Ribo_seq_S1/tmp/PRICE/output.orfs.loc -m Location /gpfs/gsfs11/users/NCBR/projects/NCBR-28/proteoformer/test/HDD1_Ribo_seq_S1/tmp/PRICE/output.orfs.cit [0:30:12]: COMPLETE: CIT file conversion

[0:30:12]: Parse PRICE output [0:30:14]: COMPLETE: PRICE output parsing

[0:30:14]: Convert to assembly table features

StevenVerbruggen commented 5 years ago

Hi Paul,

Thanks for your interest.

Both PRICE and SPECtre take quite an amount of time. That is because in the end, all results need to be brought back on one core (for posterior error calculation, all results should be in the same workspace) and on this one core, all features need to be converted to the features as required by PROTEOFORMER. This conversion requires a lot of Ensembl annotation information to be checked to construct these features, all on one core. Therefore, for big datasets, PRICE can require up to 70 hours to run in total, especially the assembly conversion part that is running on one core. For the rest, the log files looks completely ok. PRICE and SPECtre are not our own tools, so it is quite difficult to change or upgrade this procedure. For faster calculations, I would advise you to use the TIS-calling+SNP-calling+assembly combination (which is in-house), unless you do not have a LTM/HARR sample at hand.

If you have any more questions, I am glad to hear them.

All the best, Steven