AlanRace / imzMLConverter

Tool for converting mass spectrometry data to the imzML format.
15 stars 5 forks source link

Problem with .obo files still persists in new release #11

Open VolkerH opened 2 years ago

VolkerH commented 2 years ago

Hi,

I left a comment with the already closed issue #10 , but depending on your notification settings you may not get notifications for comments in closed issues. Therefore I am opening a new issue.

As mentioned at the end of #10, unfortunately the new relase 2.1.1 doesn't solve the issue with the Download of the .obo files. I still get an error message. The workaround of copying an existing Ontology folder with working .obo files into the working directory still works,

nfransaert commented 1 year ago

Hi,

I see this issue has not yet been addressed, and I get the same error when trying to convert grd to imzML (using 2.1.1).

PS G:\My Drive\PhD\AIMS\jimzMLConverter-2.1.1> java -jar jimzMLConverter-2.1.1.jar imzML -p "D:\ToF-SIMS\Koen-Nico\aims-data\itmToGRD-testData\GRD\itm-header\i220617g_vds1_4.itm.properties.txt" "D:\ToF-SIMS\Koen-Nico\aims-data\itmToGRD-testData\GRD\GRD-and-header\i220617g_vds1_4.itm.grd"
dec 21, 2022 11:06:50 AM com.alanmrace.jimzmlconverter.MainCommand convert
INFO: Converting file D:\ToF-SIMS\Koen-Nico\aims-data\itmToGRD-testData\GRD\GRD-and-header\i220617g_vds1_4.itm.grd
dec 21, 2022 11:06:50 AM com.alanmrace.jimzmlconverter.MainCommand convert
INFO: Detected ION-TOF GRD file
Exception in thread "main" java.lang.StringIndexOutOfBoundsException: String index out of range: -1
        at java.lang.String.substring(Unknown Source)
        at com.alanmrace.jimzmlparser.obo.OBO.<init>(OBO.java:115)
        at com.alanmrace.jimzmlparser.obo.OBO.<init>(OBO.java:119)
        at com.alanmrace.jimzmlparser.obo.OBO.<init>(OBO.java:119)
        at com.alanmrace.jimzmlparser.obo.OBO.loadOntologyFromURL(OBO.java:244)
        at com.alanmrace.jimzmlparser.obo.OBO.getOBO(OBO.java:173)
        at com.alanmrace.jimzmlconverter.ImzMLConverter.getOBOTerm(ImzMLConverter.java:209)
        at com.alanmrace.jimzmlconverter.ImzMLConverter.<init>(ImzMLConverter.java:95)
        at com.alanmrace.jimzmlconverter.GRDToImzMLConverter.<init>(GRDToImzMLConverter.java:82)
        at com.alanmrace.jimzmlconverter.MainCommand.convert(MainCommand.java:286)
        at com.alanmrace.jimzmlconverter.MainCommand.main(MainCommand.java:186)

The issue is still a faulty "pato.obo".

PS G:\My Drive\PhD\AIMS\jimzMLConverter-2.1.1\Ontologies> more .\pato.obo
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>302 Found</title>
</head><body>
<h1>Found</h1>
<p>The document has moved <a href="https://raw.githubusercontent.com/pato-ontology/pato/master/pato.obo">here</a>.</p>
<hr>
<address>Apache/2.4.41 (Ubuntu) Server at purl.obolibrary.org Port 80</address>
</body></html>

The main goal of this comment is to draw attention toward the issue as the converter is practically unusable at this stage (for IONTOF data at least), and previous discussions seemed to point toward a fairly simple fix.

Secondly, for now, I would be happy with a temporary workaround: @VolkerH you mentioned copying an existing Ontology folder is a workaround. I tried downloading the pato.obo from https://raw.githubusercontent.com/pato-ontology/pato/master/pato.obo but when I run the converter the ontology folder is automatically updated with the faulty pato.obo . Do you have any suggestions?

Any response would be greatly appreciated!

VolkerH commented 1 year ago

Hi @nfransaert

I've been on holiday for the last three weeks. If you haven't managed to implement a work-around meanwhile, I can look how I solved this (has been a while) and send instructions.

VolkerH commented 1 year ago

@nfransaert ,

I just checked how I solved this. In our application, we call jimzmlconverter from Python. I wrote a python function that

I can't see what you did differently or why it wouldn't work for you.

I can share a few code snippets (incomplete):

Context manager that does the steps above:

import os
import shutil
from contextlib import contextmanager
from pathlib import Path
from typing import Optional, Union

import spacem_maldi.Ontologies
from importlib_resources import files

...

@contextmanager
def tmp_dir_with_ontologies():
    """Change working directory to a temporary directory that contains the ontology files

    This is a workaround for failed downloading of .obo files:
    https://github.com/AlanRace/imzMLConverter/issues/10

    jimzmlconverter looks for .obo files in a subfolder 'Ontologies' below the working
    directory. We create a temporary directory including such a subfolder which we
    populate with bundled .obo files. When jimzmlconverter is executed in such a working
    directory it won't try (and fail) to re-download the ontologies.

    When exiting the context manager, the temporary directory is removed.
    """
    import tempfile

    previous_work_dir = os.getcwd()
    work_dir = tempfile.TemporaryDirectory()
    print(work_dir)
    # we bunde the Ontologies folder in the python package. You can just copy the folder from 
    # a known location on the filesystem
    shutil.copytree(files(spacem_maldi.Ontologies) / ".", Path(work_dir.name) / "Ontologies")
    os.chdir(work_dir.name)
    try:
        yield
    finally:
        work_dir.cleanup()
        os.chdir(previous_work_dir)

Using the context manager:

import subprocess 

with tmp_dir_with_ontologies():
       subprocess.popen(....) # call jimzmlconverter as external process
VolkerH commented 1 year ago

Here is the Ontologies folder I bundle with our python package Ontologies.zip

(Note: closed the issue by accident, therefore reopened)

nfransaert commented 1 year ago

Hi @VolkerH ,

Thank you for your elaborate instructions. I got it to work using your supplied Ontologies folder. I suspect that I was still missing a .obo file which led to the creation of the Ontologies folder, even though I manually downloaded the pato.obo.

I tested the conversion with the test.grd and test.properties.txt files provided in this repo, and this worked fine (it created the .ibd, .imzML and .imzML.tmp.ibd). However, when trying to convert my actual data (.itm of ~600 MB and .grd of ~3.7 GB), the program takes about 2 hours to convert the files.

What was your experience with the time it took the converter to convert your files? Do you think this is the expected time it takes for these kinds of datasets, or that something else is going on in my case specifically?

Thanks again!

VolkerH commented 1 year ago

Hi @nfransaert ,

I can't really give good advice on the speed. I set this up for users and haven't touched it for a year. I seem to remember that it was not super-fast (also tens of minutes, depending on machine it is running on and dataset size).

nfransaert commented 1 year ago

@VolkerH

Ok, thank you again for sharing your workaround and for helping me out!

Gscorreia89 commented 4 months ago

Hi,

I have recently tried to use this converter but ran into these same issues caused by ontologies. After some debugging, I think I found the underlying problem and a fix. This is caused by the dependency of PSI-MS-CV ontology on the STATO ontology introduced 2 years ago: https://github.com/HUPO-PSI/psi-ms-CV/commit/1eb58b894fef787466010e60241aaf974737a232

The current jmimzMLParser contains functionalities to parse ontologies in the .obo format, but STATO is only available in .owl. The parser automatically downloads STATO.owl and fails because it cannot handle .owl. A definite fix for this problem would be to handle the ontology format and add features for parsing .owl.

As a quick fix I have made a new .jar using the ontologies bundled by @VolkerH which seems to work so far. I have also tried to make one with the latest version of all ontologies after converting STATO.owl to .obo, but it seems this ontology has some properties that break the .obo format and cannot be converted (at least using ROBOT)

@AlanRace please let me know if you would like more details, happy to help fixing this.