chhh / MSFTBX

MS File ToolBox - tools for parsing some mass-spectrometry related file formats (mzML, mzXML, pep.xml, prot.xml, etc.)
Apache License 2.0
12 stars 4 forks source link

Parsing huge MzIdentML identification file by MSFTBX #5

Closed KaiLiCn closed 7 years ago

KaiLiCn commented 7 years ago

Hi:

Recent days, I'd like to find an efficient way to parse MzIdentML file, so I want to try MSFTBX. Unfortunately, I didn't find it on your tutorial page, but only the codes to parse pep.xml file. Can you send me more examples about MzIdentML parsing or update it on tutorial page? Especially for the huge files.

I will appreciate it if you can help me!

Best regards!

Kai

chhh commented 7 years ago

@KaiLiCn Have you tried just:

Path path = Paths.get("path/to/file.mzid");
MzIdentMLType mzid = MzIdentMLParser.parse(path);

?

You can find a short example in umich.ms.fileio.filetypes.mzidentml.example.MzIdentMlExample which prints some IDs from a parsed file.

I've never dealt with really huge mzid files, I don't know if it's possible at all to split them into smaller pieces, but I doubt that, considering the amount of cross referencing there.

KaiLiCn commented 7 years ago

Hi:

Thanks for your help and I have done it by MSFTBX and got data what I wanted!

However, I found a small problem in the package. When I got attribute UnitAccession of AbstractParamType, it was null. Then I read source codes and found something.

    @XmlAttribute(name = "name", required = true)
    protected String name;
    @XmlAttribute(name = "value")
    protected String value;
    @XmlAttribute(name = "unitAccession")
    protected String unitAccession;
    @XmlAttribute(name = "unitName")
    protected String unitName;
    @XmlAttribute(name = "unitCvRef")
    protected String unitCvRef;

Some attributes name are different from mzIdentML file, such as unitAccession( accession in mzId), unitCvRef( cvRef in mzId) and uniName(name in mzId). And I tried to change it, it worked! Can you update the codes so that it works by maven?

Thanks!

chhh commented 7 years ago

Thanks for the find! It is a little strange though as all those annotations are according to xsd schema (as I had it back when I was writing the code).

There are two possibilities: Maybe they changed the xsd since and it needs to be updated or maybe it's a mistake in the mzid file (this also happens, I definitely saw a lot of mzxml files which were incorrect).

I'm traveling for the next two weeks, will look into it when I come back. Could you please create a pull request with the changes that worked for you?

On Jul 24, 2017 06:31, "KaiLiCn" notifications@github.com wrote:

Hi:

Thanks for your help and I have done it by MSFTBX and got data what I wanted!

However, I found a small problem in the package. When I got attribute UnitAccession of AbstractParamType, it was null. Then I read source codes and found something.

@XmlAttribute(name = "name", required = true) protected String name; @XmlAttribute(name = "value") protected String value; @XmlAttribute(name = "unitAccession") protected String unitAccession; @XmlAttribute(name = "unitName") protected String unitName; @XmlAttribute(name = "unitCvRef") protected String unitCvRef;

Some attributes name are different from mzIdentML file, such as unitAccession( accession in mzId), unitCvRef( cvRef in mzId) and uniName(name in mzId). And I tried to change it, it worked! Can you update the codes so that it works by maven?

Thanks!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/chhh/MSFTBX/issues/5#issuecomment-317383086, or mute the thread https://github.com/notifications/unsubscribe-auth/AAGlfvU1MDCK72n6bJ6GKh2ycK2AEzqxks5sRHKMgaJpZM4OgNYX .

chhh commented 7 years ago

@KaiLiCn I've finally had time to look into it, and looks like you're not right. AbstractParamType as defined in the XSD schema of mzIdentML (.mzid files) only defines unitXXX params, they are for units of measurement. The ones you were looking for are defined in CVParamType, which extends AbstractParamType. And for the most part units of measurement are not specified in the actual files, so you get null.

I guess I see why you wanted to access AbstractParamType, it must be because there are methods like public List<AbstractParamType> getParamGroup(). The problem is that in mzid standard there are CVParams and UserParams, both of which inherit from AbstractParamType. So every time you see AbstractParamType you need to cast it yourself to one of the actual types:

List<AbstractParamType> paramGroup = blabla.getParamGroup();
for (AbstractParamType param : paramGroup) {

    if (param instanceof CVParamType) {
        CVParamType p = (CVParamType)param;
        // do something with cvParam

    } else if (param instanceof UserParamType) {
        UserParamType p = (UserParamType)param;
        // do something with userParam

    }
}

There's no way around this, because both cvParams and userParams are stored in the same list.

KaiLiCn commented 7 years ago

Hi:

Sorry about that, I think you are right. And now I get it.

Thanks for your patience!

Kai