bartongroup / yanocomp

Yet another nanopore modification comparison tool
MIT License
11 stars 1 forks source link

error in gmmtest #10

Open JannesSP opened 2 years ago

JannesSP commented 2 years ago

2022-10-07 12:37:26,831 WARNING Default min depth set to 6 to match window size 3 2022-10-07 12:37:26,836 INFO Running gmmtest in 3-comp GMM (uniform outliers) mode with 1 control datasets and 1 treatment datasets 2022-10-07 12:37:26,839 INFO 1 genes to be processed on 1 workers Traceback (most recent call last): File "/home/yi98suv/anaconda3/envs/yanocomp/bin/yanocomp", line 8, in sys.exit(cli()) File "/home/yi98suv/anaconda3/envs/yanocomp/lib/python3.9/site-packages/click/core.py", line 1130, in call return self.main(args, kwargs) File "/home/yi98suv/anaconda3/envs/yanocomp/lib/python3.9/site-packages/click/core.py", line 1055, in main rv = self.invoke(ctx) File "/home/yi98suv/anaconda3/envs/yanocomp/lib/python3.9/site-packages/click/core.py", line 1657, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/yi98suv/anaconda3/envs/yanocomp/lib/python3.9/site-packages/click/core.py", line 1404, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/yi98suv/anaconda3/envs/yanocomp/lib/python3.9/site-packages/click/core.py", line 760, in invoke return __callback(args, kwargs) File "/home/yi98suv/anaconda3/envs/yanocomp/lib/python3.9/site-packages/yanocomp/opts.py", line 16, in _make_dataclass return cmd(dynamic_dataclass(cls_name, bases=bases, cli_kwargs)) File "/home/yi98suv/anaconda3/envs/yanocomp/lib/python3.9/site-packages/yanocomp/gmmtest.py", line 333, in gmm_test res, sm_preds = parallel_test(opts) File "/home/yi98suv/anaconda3/envs/yanocomp/lib/python3.9/site-packages/yanocomp/gmmtest.py", line 210, in parallel_test res, sm_preds = test_chunk( File "/home/yi98suv/anaconda3/envs/yanocomp/lib/python3.9/site-packages/yanocomp/gmmtest.py", line 155, in test_chunk chrom, strand = load_gene_attrs(gene_id, cntrl_h5) File "/home/yi98suv/anaconda3/envs/yanocomp/lib/python3.9/site-packages/yanocomp/io.py", line 291, in load_gene_attrs chrom = g.attrs['chrom'] File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "/home/yi98suv/anaconda3/envs/yanocomp/lib/python3.9/site-packages/h5py/_hl/attrs.py", line 60, in getitem attr = h5a.open(self._id, self._e(name)) File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper File "h5py/h5a.pyx", line 77, in h5py.h5a.open KeyError: "Can't open attribute (can't locate attribute: 'chrom')"

Any idea how to fix this or where the error is coming from?

mparker2 commented 2 years ago

Hi @JannesSP,

Looks like there is an issue with the format of the hdf5 file, specifically that there is not a chromosome attribute saved for the gene for some reason. Can you please describe how you ran yanocomp prep & yanocomp gmmtest. I noticed that you are only testing one gene, is this a viral RNA?

BW Matt

JannesSP commented 2 years ago

Hey @mparker2,

exactly, I am testing it on viral RNA - so only one "chromosome" to which I aligned/mapped my reads and ran nanopolish eventalign on. I executed yanocomp prep with the command like this: yanocomp prep -e nanopolish_eventalign.tsv -h yanocomp_prep.hdf5 -p 12 And for yanocomp gmmtest I currently only have one control and one test sample.

Kind regards, Jannes

JannesSP commented 2 years ago

Hey @mparker2

Found the problem: The reference Fasta file I downloaded from a database had the '/' character in the header, which shows up as the contig in the nanopolish eventalign .tsv. The h5py API is interpreting the '/' as a group separation character when creating the hdf5 (datasets, groups etc.) in the yanocomp prep. This is why multiple unwanted groups were created in the hdf5 file and the gmmtest could not find the 'chrom' attribute. Maybe you can somehow catch this error, replace this character in the code or tell the user to check their Fasta headers for this character (gene_id).

Kind regards, Jannes

mparker2 commented 2 years ago

aha! well done for figuring that out. Sorry I didn't get any time to help... I should really sanitise the strings used to create all attributes better. WIll mark this as a bug to get around to! Many thanks for reporting it

Matt

kwonej0617 commented 1 year ago

Hi @mparker2 Because I got the same error message as @JannesSP, I checked the header of my fasta file whether the header include "/", making the error.

Here is the first line of my fasta file. There is no "/' at the very top of the file. image

However, I found that in the middle of the file, I found the lines that include "/", Would it make the problem for the error? If so, how can I modify my fasta file to generate yanocomp gmmtest mode without the error? image

I am looking forward to hearing from you. Thank you!