bartongroup / yanocomp

Yet another nanopore modification comparison tool
MIT License
11 stars 1 forks source link

error during the prep step #8

Closed JBerthelier closed 2 years ago

JBerthelier commented 2 years ago

Dear Barton lab,

We are trying to make works yanocomp with Arabidobsis DRS data you published Col-0 and vir-1.

However, we got these two errors during the prep step for the dataset:

First:


Traceback (most recent call last):

File "/home/e/work/miniconda3/envs/yanocomp/bin/yanocomp", line 8, in

sys.exit(cli())

File "/home/e/work/miniconda3/envs/yanocomp/lib/python3.9/site-packages/click/core.py", line 1128, in call

return self.main(*args, **kwargs)

File "/home/e/work/miniconda3/envs/yanocomp/lib/python3.9/site-packages/click/core.py", line 1053, in main

rv = self.invoke(ctx)

File "/home/e/work/miniconda3/envs/yanocomp/lib/python3.9/site-packages/click/core.py", line 1659, in invoke

return _process_result(sub_ctx.command.invoke(sub_ctx))

File "/home/e/work/miniconda3/envs/yanocomp/lib/python3.9/site-packages/click/core.py", line 1395, in invoke

return ctx.invoke(self.callback, **ctx.params)

File "/home/e/work/miniconda3/envs/yanocomp/lib/python3.9/site-packages/click/core.py", line 754, in invoke

return __callback(*args, **kwargs)

File "/home/e/work/miniconda3/envs/yanocomp/lib/python3.9/site-packages/yanocomp/opts.py", line 16, in _make_dataclass

return cmd(dynamic_dataclass(cls_name, bases=bases, **cli_kwargs))

File "/home/e/work/miniconda3/envs/yanocomp/lib/python3.9/site-packages/yanocomp/prep.py", line 182, in nanopolish_collapse

save_events_to_hdf5(

File "/home/e/work/miniconda3/envs/yanocomp/lib/python3.9/site-packages/yanocomp/io.py", line 180, in save_events_to_hdf5

for gene_id, chrom, strand, records in collapsed:

File "/home/e/work/miniconda3/envs/yanocomp/lib/python3.9/site-packages/yanocomp/prep.py", line 154, in parallel_collapse

for chunk, gene_info in pool.imap_unordered(_collapse_func, ea_chunker):

File "/home/e/work/miniconda3/envs/yanocomp/lib/python3.9/multiprocessing/pool.py", line 870, in next

raise value

ValueError: could not convert string to float: '0.00TCTAC'


Second

Traceback (most recent call last):

File "/home/e/work/miniconda3/envs/yanocomp/bin/yanocomp", line 8, in

sys.exit(cli())

File "/home/e/work/miniconda3/envs/yanocomp/lib/python3.9/site-packages/click/core.py", line 1128, in call

return self.main(*args, **kwargs)

File "/home/e/work/miniconda3/envs/yanocomp/lib/python3.9/site-packages/click/core.py", line 1053, in main

rv = self.invoke(ctx)

File "/home/e/work/miniconda3/envs/yanocomp/lib/python3.9/site-packages/click/core.py", line 1659, in invoke

return _process_result(sub_ctx.command.invoke(sub_ctx))

File "/home/e/work/miniconda3/envs/yanocomp/lib/python3.9/site-packages/click/core.py", line 1395, in invoke

return ctx.invoke(self.callback, **ctx.params)

File "/home/e/work/miniconda3/envs/yanocomp/lib/python3.9/site-packages/click/core.py", line 754, in invoke

return __callback(*args, **kwargs)

File "/home/e/work/miniconda3/envs/yanocomp/lib/python3.9/site-packages/yanocomp/opts.py", line 16, in _make_dataclass

return cmd(dynamic_dataclass(cls_name, bases=bases, **cli_kwargs))

File "/home/e/work/miniconda3/envs/yanocomp/lib/python3.9/site-packages/yanocomp/prep.py", line 182, in nanopolish_collapse

save_events_to_hdf5(

File "/home/e/work/miniconda3/envs/yanocomp/lib/python3.9/site-packages/yanocomp/io.py", line 180, in save_events_to_hdf5

for gene_id, chrom, strand, records in collapsed:

File "/home/e/work/miniconda3/envs/yanocomp/lib/python3.9/site-packages/yanocomp/prep.py", line 154, in parallel_collapse

for chunk, gene_info in pool.imap_unordered(_collapse_func, ea_chunker):

File "/home/e/work/miniconda3/envs/yanocomp/lib/python3.9/multiprocessing/pool.py", line 870, in next

raise value

KeyError: '70.71'


By chance, do you have any idea of the issue ?

We are now trying to re-run the prep step without the GTF option "-g"

Best regards,

Jeremy

JBerthelier commented 2 years ago

Also this error showed up after getting a quite big .hdf5 file

please let me know if you need extra info

we used this command for minimap2

minimap2 -ax map-ont -uf -t 3 --secondary=no

Best regards

Jeremy

mparker2 commented 2 years ago

Hi Jeremy, I'm sorry that you got this error & that it happened so far into the analysis. It is always very frustrating when that happens! I will try to work out what the problem is. Did you get this error for all replicates or just for a subset?

One thing I should mention is that the Col-0 and vir-1 data from the eLife paper were not grown and sequenced at the same time, and so may not be directly comparable due to technical variation. We sequenced the VIR complemented line at the same time as vir-1 for this purpose.

mparker2 commented 2 years ago

The first error sounds like it could result from a malformed record in your nanopolish eventalign file. Were you piping the output of nanopolish directly to yanocomp prep? If not, could you grep the eventalign file for lines containing the "0.00TCTAC" to see if you have any malformed records?

mparker2 commented 2 years ago

I wonder if a malformed nanopolish file or unexpected nanopolish eventalign format could be causing the second error as well. Please could you describe how you run nanopolish?

JBerthelier commented 2 years ago

Dear Matthew,

Thank you very much for your quick support! No problem for the error, I know that this is common for incoming tool, and also I may have done mistakes.

I understand the issue by using the DRS Col-0 data, and we are now using VIRc as control, thank you for the warning.

We have tried to greg the "0.00TCTAC" from the .hdf5 output but no finding.

Actually, we didn`t piped the output of nanopolish directly to yanocomp prep (as you suggested on your protocol). We are now following your protocol and piping the output of nanopolish directly to yanocomp prep to see if it fix that. At now we are only trying with one replicate for each condition (ctrl vs vir1).

I keep you posted.

mparker2 commented 2 years ago

Hi Jeremy, sorry I meant grep from the nanopolish eventalign output, not from the hdf5 file. Can you describe how you ran nanopolish?

JBerthelier commented 2 years ago

Dear Matthew, Here are some update.

We manage to make works the prep step and get .hdh5 for VIRc and VIR-1 data. we did two differents transcriptome-base analyses

-First: with Araport11 and yanocomp prep was failing (error showed in my previous messages). The gtf looks like that : Chr1 Araport11 gene 3631 5899 . + . transcript_id "AT1G01010"; gene_id "AT1G01010";

-Second: we tried with a home made transcriptome and yanocomp prep finished for all DRS replicate data. The gtf looks like that : Chr1 homemade transcript 3621 5899 1000 + . gene_id "gene.1"; transcript_id "gene.1.1";

It seems that something goes wrong regarding to the GTF format. Also for Araport11 analayses, yanocomp prep was able to complete without the gtf option (-g)

JBerthelier commented 2 years ago

Sorry for the grep of the eventalign output, yes I did to fast and I checked in the wrong file, now it is deleted but I could re-do it and let you know later.

JBerthelier commented 2 years ago

Dear Matthew,

Now we are trying to use gmmtest with this command on our cluster:

yanocomp gmmtest -p 12 -c VIRc-1.hdf5 -c VIRc-2.hdf5 -c VIRc-3.hdf5 -c VIRc-4.hdf5 -t vir1-1.hdf5 -t vir1-2.hdf5 -t vir1-3.hdf5 -t vir1-4.hdf5 -o YGMM_output -s YGMM_output.json.gz

but we immediatly got this error below, I tried to check online but I do not found any clues. Do you have an idea of the issues?

Thanks again,


2021-12-02 10:05:37,990 WARNING Default min depth set to 6 to match window size 3 2021-12-02 10:05:37,994 INFO Running gmmtest in 3-comp GMM (uniform outliers) mode with 4 control datasets and 4 treatment datasets 2021-12-02 10:05:38,153 INFO 15,465 genes to be processed on 12 workers /home/user/miniconda3/envs/yanocomp/lib/python3.9/site-packages/yanocomp/stats.py:23: RuntimeWarning: Mean of empty slice. mu = X.mean(axis=0) /home/user/miniconda3/envs/yanocomp/lib/python3.9/site-packages/numpy/core/_methods.py:181: RuntimeWarning: invalid value encountered in true_divide ret = um.true_divide( /home/user/miniconda3/envs/yanocomp/lib/python3.9/site-packages/yanocomp/stats.py:24: RuntimeWarning: Mean of empty slice. sig = np.sqrt(((X - mu) 2.0).mean(0)) multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/home/user/miniconda3/envs/yanocomp/lib/python3.9/multiprocessing/pool.py", line 125, in worker result = (True, func(args, kwds)) File "/home/user/miniconda3/envs/yanocomp/lib/python3.9/site-packages/yanocomp/gmmtest.py", line 164, in test_chunk was_tested, result, sm = position_stats( File "/home/user/miniconda3/envs/yanocomp/lib/python3.9/site-packages/yanocomp/stats.py", line 455, in position_stats r.ks_stat, ks_p_val = pca_kstest( File "/home/user/miniconda3/envs/yanocomp/lib/python3.9/site-packages/yanocomp/stats.py", line 71, in pca_kstest comps = pca_comp1(pooled) File "/home/user/miniconda3/envs/yanocomp/lib/python3.9/site-packages/yanocomp/stats.py", line 28, in pca_comp1 i1 = np.argmax(s 2.0) File "<__array_function__ internals>", line 5, in argmax File "/home/user/miniconda3/envs/yanocomp/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 1195, in argmax return _wrapfunc(a, 'argmax', axis=axis, out=out) File "/home/user/miniconda3/envs/yanocomp/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 57, in _wrapfunc return bound(args, kwds) ValueError: attempt to get argmax of an empty sequence """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/user/miniconda3/envs/yanocomp/bin/yanocomp", line 8, in sys.exit(cli()) File "/home/user/miniconda3/envs/yanocomp/lib/python3.9/site-packages/click/core.py", line 1128, in call return self.main(args, kwargs) File "/home/user/miniconda3/envs/yanocomp/lib/python3.9/site-packages/click/core.py", line 1053, in main rv = self.invoke(ctx) File "/home/user/miniconda3/envs/yanocomp/lib/python3.9/site-packages/click/core.py", line 1659, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/user/miniconda3/envs/yanocomp/lib/python3.9/site-packages/click/core.py", line 1395, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/user/miniconda3/envs/yanocomp/lib/python3.9/site-packages/click/core.py", line 754, in invoke return __callback(args, kwargs) File "/home/user/miniconda3/envs/yanocomp/lib/python3.9/site-packages/yanocomp/opts.py", line 16, in _make_dataclass return cmd(dynamic_dataclass(cls_name, bases=bases, cli_kwargs)) File "/home/user/miniconda3/envs/yanocomp/lib/python3.9/site-packages/yanocomp/gmmtest.py", line 333, in gmm_test res, sm_preds = parallel_test(opts) File "/home/user/miniconda3/envs/yanocomp/lib/python3.9/site-packages/yanocomp/gmmtest.py", line 204, in parallel_test for chunk_res, chunk_sm_preds in pool.imap_unordered( File "/home/user/miniconda3/envs/yanocomp/lib/python3.9/multiprocessing/pool.py", line 870, in next raise value ValueError: attempt to get argmax of an empty sequence

mparker2 commented 2 years ago

First: with Araport11 and yanocomp prep was failing (error showed in my previous messages).

Please can you post the GTF file here or email it to me (m.t.parker@dundee.ac.uk) so I can investigate. I can't see anything different with that file for the line you posted but the gtf parser in yanocomp is pretty rudimentary so if there is something non-standard about the file it might be causing issues. What was the error you got when using that file?

Now we are trying to use gmmtest with this command on our cluster:

yanocomp gmmtest -p 12 -c VIRc-1.hdf5 -c VIRc-2.hdf5 -c VIRc-3.hdf5 -c VIRc-4.hdf5 -t vir1-1.hdf5 -t vir1-2.hdf5 -t vir1-3.hdf5 -t vir1-4.hdf5 -o YGMM_output -s YGMM_output.json.gz

but we immediatly got this error below ValueError: attempt to get argmax of an empty sequence

This is a interesting one... it seems that somehow the code ends up testing a position with a sample size window size of zero. Can you try this:

yanocomp gmmtest -c VIRc-1.hdf5 -c VIRc-2.hdf5 -c VIRc-3.hdf5 -c VIRc-4.hdf5 -t vir1-1.hdf5 -t vir1-2.hdf5 -t vir1-3.hdf5 -t vir1-4.hdf5 -o YGMM_output --test-gene AT1G29070

to see if it is a general problem or if it is caused by a specific position in the input. The above command should only test one gene.

mparker2 commented 2 years ago

Please can you also run conda list in the conda environment you are using and post the output here.

JBerthelier commented 2 years ago

Dear Matthew,

Thank you for the quick support. Ok I send you the Araport11 annotation I am using by e-mail.

We have ran gmmtest at gene level and it works for AT1G29070 (in our case with the id MSTRG.2989)

We also tried other genes and it works too. We got a bed file with expected output.

here is the log


2021-12-03 09:53:34,563 WARNING Default min depth set to 6 to match window size 3 2021-12-03 09:53:34,572 INFO Running gmmtest in 3-comp GMM (uniform outliers) mode with 4 control datasets and 4 treatment datasets 2021-12-03 09:53:34,572 INFO Testing gene MSTRG.2989 2021-12-03 09:53:44,234 INFO Complete. Tested 24 positions 2021-12-03 09:53:44,238 INFO 12 positions significant at 5% level 2021-12-03 09:53:44,264 INFO Writing output to /flash/user/yanocomp_gmmtest/homemade_transcriptome/YGMM_output.bed


Regarding conda list this is what we got

packages in environment at /home/user/miniconda3/envs/yanocomp: Name Version Build Channel

_libgcc_mutex 0.1 conda_forge conda-forge

_openmp_mutex 4.5 1_gnu conda-forge

c-ares 1.18.1 h7f98852_0 conda-forge

ca-certificates 2021.10.8 ha878542_0 conda-forge

certifi 2021.10.8 py39hf3d152e_1 conda-forge

click 8.0.3 py39hf3d152e_0 conda-forge

click-log 0.3.2 pypi_0 pypi

cycler 0.11.0 pyhd8ed1ab_0 conda-forge

freetype 2.10.4 h0708190_1 conda-forge

h5py 2.10.0 nompi_py39h98ba4bc_106 conda-forge

hdf5 1.10.6 nompi_h6a2412b_1114 conda-forge

jbig 2.1 h7f98852_2003 conda-forge

joblib 0.17.0 py_0 conda-forge

jpeg 9d h36c2ea0_0 conda-forge

kiwisolver 1.3.2 py39h1a9c180_0 conda-forge

krb5 1.19.2 hcc1bbae_2 conda-forge

lcms2 2.12 hddcbb42_0 conda-forge

ld_impl_linux-64 2.36.1 hea4e1c9_2 conda-forge

lerc 3.0 h9c3ff4c_0 conda-forge

libblas 3.9.0 12_linux64_openblas conda-forge

libcblas 3.9.0 12_linux64_openblas conda-forge

libcurl 7.79.1 h2574ce0_1 conda-forge

libdeflate 1.8 h7f98852_0 conda-forge

libedit 3.1.20191231 he28a2e2_2 conda-forge

libev 4.33 h516909a_1 conda-forge

libffi 3.4.2 h9c3ff4c_4 conda-forge

libgcc-ng 11.2.0 h1d223b6_11 conda-forge

libgfortran-ng 11.2.0 h69a702a_11 conda-forge

libgfortran5 11.2.0 h5c6108e_11 conda-forge

libgomp 11.2.0 h1d223b6_11 conda-forge

liblapack 3.9.0 12_linux64_openblas conda-forge

libnghttp2 1.43.0 h812cca2_1 conda-forge

libopenblas 0.3.18 pthreads_h8fe5266_0 conda-forge

libpng 1.6.37 h21135ba_2 conda-forge

libssh2 1.10.0 ha56f1ee_2 conda-forge

libstdcxx-ng 11.2.0 he4da1e4_11 conda-forge

libtiff 4.3.0 h6f004c6_2 conda-forge

libwebp-base 1.2.1 h7f98852_0 conda-forge

libzlib 1.2.11 h36c2ea0_1013 conda-forge

lz4-c 1.9.3 h9c3ff4c_1 conda-forge

matplotlib-base 3.4.3 py39h2fa2bec_1 conda-forge

ncurses 6.2 h58526e2_4 conda-forge

networkx 2.6.3 pyhd8ed1ab_1 conda-forge

numpy 1.21.3 py39hdbf815f_0 conda-forge

olefile 0.46 pyh9f0ad1d_1 conda-forge

openjpeg 2.4.0 hb52868f_1 conda-forge

openssl 1.1.1l h7f98852_0 conda-forge

pandas 1.3.4 py39hde0f152_0 conda-forge

patsy 0.5.2 pyhd8ed1ab_0 conda-forge

pillow 8.3.2 py39ha612740_0 conda-forge

pip 21.3.1 pyhd8ed1ab_0 conda-forge

pomegranate 0.13.3 py39h1a9c180_2 conda-forge

pyparsing 3.0.4 pyhd8ed1ab_0 conda-forge

python 3.9.7 hb7a2778_3_cpython conda-forge

python-dateutil 2.8.2 pyhd8ed1ab_0 conda-forge

python_abi 3.9 2_cp39 conda-forge

pytz 2021.3 pyhd8ed1ab_0 conda-forge

pyyaml 6.0 py39h3811e60_0 conda-forge

readline 8.1 h46c0cb4_0 conda-forge

scikit-learn 1.0.1 py39h7c5d8c9_1 conda-forge

scipy 1.7.1 py39hee8e79c_0 conda-forge

setuptools 58.4.0 py39hf3d152e_1 conda-forge

six 1.16.0 pyh6c4a22f_0 conda-forge

sqlite 3.36.0 h9cd32fc_2 conda-forge

statsmodels 0.13.0 py39hce5d2b2_0 conda-forge

threadpoolctl 3.0.0 pyh8a188c0_0 conda-forge

tk 8.6.11 h27826a3_1 conda-forge

tornado 6.1 py39h3811e60_1 conda-forge

tzdata 2021e he74cb21_0 conda-forge

wheel 0.37.0 pyhd8ed1ab_1 conda-forge

xz 5.2.5 h516909a_1 conda-forge

yaml 0.2.5 h516909a_0 conda-forge

yanocomp 0.2 pypi_0 pypi

zlib 1.2.11 h36c2ea0_1013 conda-forge

zstd 1.5.0 ha95c52a_0 conda-forge

JBerthelier commented 2 years ago

What was the error you got when using that file?

Regarding error with the Araport11 gtf file, we got the error message I displayed in my first message.

I tried to re-format the Araport11 gtf file manually, in order to look like my home-made transcriptome gtf (the one that works) and we still got this error


2021-11-25 16:42:39,591 INFO Loading GTF records from /flash/user/nanopolish-yanocomp_prep/Araport11/VIRc/VIRc_1/Araport11_GTF_genes_transposons.gtf 2021-11-25 16:42:43,841 INFO 58,207 transcript records loaded from GTF Traceback (most recent call last): File "/home/e/user/miniconda3/envs/yanocomp/bin/yanocomp", line 8, in sys.exit(cli()) File "/home/e/user/miniconda3/envs/yanocomp/lib/python3.9/site-packages/click/core.py", line 1128, in call return self.main(args, kwargs) File "/home/e/user/miniconda3/envs/yanocomp/lib/python3.9/site-packages/click/core.py", line 1053, in main rv = self.invoke(ctx) File "/home/e/user/miniconda3/envs/yanocomp/lib/python3.9/site-packages/click/core.py", line 1659, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/e/user/miniconda3/envs/yanocomp/lib/python3.9/site-packages/click/core.py", line 1395, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/e/user/miniconda3/envs/yanocomp/lib/python3.9/site-packages/click/core.py", line 754, in invoke return __callback(args, kwargs) File "/home/e/user/miniconda3/envs/yanocomp/lib/python3.9/site-packages/yanocomp/opts.py", line 16, in _make_dataclass return cmd(dynamic_dataclass(cls_name, bases=bases, cli_kwargs)) File "/home/e/user/miniconda3/envs/yanocomp/lib/python3.9/site-packages/yanocomp/prep.py", line 182, in nanopolish_collapse save_events_to_hdf5( File "/home/e/user/miniconda3/envs/yanocomp/lib/python3.9/site-packages/yanocomp/io.py", line 180, in save_events_to_hdf5 for gene_id, chrom, strand, records in collapsed: File "/home/e/user/miniconda3/envs/yanocomp/lib/python3.9/site-packages/yanocomp/prep.py", line 154, in parallel_collapse for chunk, gene_info in pool.imap_unordered(_collapse_func, ea_chunker): File "/home/e/user/miniconda3/envs/yanocomp/lib/python3.9/multiprocessing/pool.py", line 870, in next raise value KeyError: 'AT1G01046.1'

mparker2 commented 2 years ago

OK I believe I understand the error regarding the GTF file now. AT1G01046.1 is a microRNA gene which is in both the GTF and fasta files you sent, but in the GTF is only annotated with the feature type (3rd column of GTF) "miRNA" or "miRNA_primary_transcript". Currently yanocomp only parses genes with "exon" feature types to generate the transcriptome to genome mapping database. So AT1G01046 is ignored in the GTF file and this leads to the error when a read mapping to it is encountered in the eventalign output.

GTF is not amazingly well defined but I think it is reasonable to expect that all transcripts (mRNAs, TEs, and non-coding RNA) should have exon records, so I'd say this GTF file is non-standard. You could try removing the non-coding and TE records from the GTF and fasta. Alternatively you could use gffread to clean the GTF and add exon attributes to the miRNA records. I will also update Yanocomp to handle these cases more gracefully (i.e. give a warning instead of an error).

I'm still not sure what is going on with the gmmtest error. Given that it is running ok for individual genes, it may be caused by a specific corner case in the input, or it could be that something weird is happening when you run in multiprocessing mode. You said that it gives the error almost immediately when you run for all genes? What if you set the number of processes to 1? i.e.:

yanocomp gmmtest -p 1 -c VIRc-1.hdf5 -c VIRc-2.hdf5 -c VIRc-3.hdf5 -c VIRc-4.hdf5 -t vir1-1.hdf5 -t vir1-2.hdf5 -t vir1-3.hdf5 -t vir1-4.hdf5 -o YGMM_output -s YGMM_output.json.gz
mparker2 commented 2 years ago

Also I should note that the miRNA and miRNA_primary_transcript records in the GTF and FASTA files are named identically so you may also have some problems with them clashing:

GTF:

1   Araport11   miRNA_primary_transcript    28500   28706   .   +   .   transcript_id "AT1G01046.1"; gene_id "AT1G01046";
1   Araport11   miRNA   28635   28655   .   +   .   transcript_id "AT1G01046.1"; gene_id "AT1G01046";

FASTA:

>AT1G01046.1
TACTTTTCTAATATCACGAGGACTTACATGGCCTCAAGTCACCTGTGGTGTTGTGCAAGAAGGAGAAGCA
AAGTCTGTCTATGTATTATGAGATAGCTACTTCTATGGCTAGGATATATGTTGTACAAGACCGGCTTTTC
TTCTACTTCTTGCACAACCTGAGGTTATTGAGGCTATACAAGTCTTCTTCTATAATGTTATTTATTA
>AT1G01046.1
TTTTCTTCTACTTCTTGCACA
JBerthelier commented 2 years ago

Dear Matthew, Thanks for figure it out the issue with the Araport11 gtf, I see the problem. Yes, I will produce a new version with gffread to fix it.

Regarding, gmmtest we re-run with -p 1, but after several hours we still got the same issue below, unfortunatively. Let me known if you need some files or extra info to help you to figure it out.

Best


2021-12-06 10:26:05,675 WARNING Default min depth set to 6 to match window size 3 2021-12-06 10:26:05,679 INFO Running gmmtest in 3-comp GMM (uniform outliers) mode with 4 control datasets and 4 treatment datasets 2021-12-06 10:26:17,733 INFO 15,465 genes to be processed on 1 workers /home/user/miniconda3/envs/yanocomp/lib/python3.9/site-packages/yanocomp/stats.py:23: RuntimeWarning: Mean of empty slice. mu = X.mean(axis=0) /home/user/miniconda3/envs/yanocomp/lib/python3.9/site-packages/numpy/core/_methods.py:181: RuntimeWarning: invalid value encountered in true_divide ret = um.true_divide( /home/user/miniconda3/envs/yanocomp/lib/python3.9/site-packages/yanocomp/stats.py:24: RuntimeWarning: Mean of empty slice. sig = np.sqrt(((X - mu) 2.0).mean(0)) Traceback (most recent call last): File "/home/user/miniconda3/envs/yanocomp/bin/yanocomp", line 8, in sys.exit(cli()) File "/home/user/miniconda3/envs/yanocomp/lib/python3.9/site-packages/click/core.py", line 1128, in call return self.main(args, kwargs) File "/home/user/miniconda3/envs/yanocomp/lib/python3.9/site-packages/click/core.py", line 1053, in main rv = self.invoke(ctx) File "/home/user/miniconda3/envs/yanocomp/lib/python3.9/site-packages/click/core.py", line 1659, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/user/miniconda3/envs/yanocomp/lib/python3.9/site-packages/click/core.py", line 1395, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/user/miniconda3/envs/yanocomp/lib/python3.9/site-packages/click/core.py", line 754, in invoke return __callback(args, kwargs) File "/home/user/miniconda3/envs/yanocomp/lib/python3.9/site-packages/yanocomp/opts.py", line 16, in _make_dataclass return cmd(dynamic_dataclass(cls_name, bases=bases, cli_kwargs)) File "/home/user/miniconda3/envs/yanocomp/lib/python3.9/site-packages/yanocomp/gmmtest.py", line 333, in gmm_test res, sm_preds = parallel_test(opts) File "/home/user/miniconda3/envs/yanocomp/lib/python3.9/site-packages/yanocomp/gmmtest.py", line 210, in parallel_test res, sm_preds = test_chunk( File "/home/user/miniconda3/envs/yanocomp/lib/python3.9/site-packages/yanocomp/gmmtest.py", line 164, in test_chunk was_tested, result, sm = position_stats( File "/home/user/miniconda3/envs/yanocomp/lib/python3.9/site-packages/yanocomp/stats.py", line 455, in position_stats r.ks_stat, ks_p_val = pca_kstest( File "/home/user/miniconda3/envs/yanocomp/lib/python3.9/site-packages/yanocomp/stats.py", line 71, in pca_kstest comps = pca_comp1(pooled) File "/home/user/miniconda3/envs/yanocomp/lib/python3.9/site-packages/yanocomp/stats.py", line 28, in pca_comp1 i1 = np.argmax(s 2.0) File "<__array_function__ internals>", line 5, in argmax File "/home/user/miniconda3/envs/yanocomp/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 1195, in argmax return _wrapfunc(a, 'argmax', axis=axis, out=out) File "/home/user/miniconda3/envs/yanocomp/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 57, in _wrapfunc return bound(*args, **kwds) ValueError: attempt to get argmax of an empty sequence

JBerthelier commented 2 years ago

Dear Matthew, Just to keep you posted that we did again the full analyses using an other home-made transcriptiome (stringtie transcriptiome for vir1) and this time the gmmtest finished without error. We got the transcriptome-wide bed file. I do not understand why it didn t finished with our other home-made transcriptome because it was similarly produced with stringtie.

mparker2 commented 2 years ago

Hi Jeremy, It is very strange. There must a specific corner case in your stringtie transcriptome that is causing the problem. It would be great if you could share the HDF5 files and the GTF file with me so I could work out what the gene is that is causing the problem. I think the error handling in general needs more work so that these corner cases do not disrupt the whole analysis. BW Matt