Updates for v18.1? - Githubissues

bdelepine commented 6 years ago

Hi,

I am curious about your plans for the future development of this project. Do you intend to maintain it?

I noticed that BRENDA decided to release publicly their latest version (v18.1) in a text format, and your parser seems to be the only one dealing with it right now on GitHub. Anyway, I encountered a couple of issues/misunderstandings when I tried to use it and I figured it might be of interest to you if you wanted to push an update for the new release:

related to Python 3 : "StandardError" should be replaced by "Exception"
Reference section is not parsed (returns list of None): parser._parse_reference() is empty, so that's expected
Protein section is not parsed (returns list of None): it turned out that parser._parse_protein() do not return any Entry on purpose and that Protein-related data are stored in content[ec_number][0].organisms

HTH

Midnighter commented 6 years ago

Hi, I haven't touched the code for the BRENDA parser in a long time indeed. There are several sections that I never completed the parser for because the information was of no interest to me at the time. I don't have a need myself right now but if there is demand I can dedicate some time to updating it. Also a release on PyPi would probably be helpful. Do you know of anyone beyond yourself who would be interested in an updated and improved parser?

bdelepine commented 6 years ago

Well, BRENDA is one of the best databases with metabolic reaction data. I know a few research groups and companies for whom those data are critical... but I do believe they all have their own scraper/parser in-house. The situation kinda changed with v18.1 since it is freely available (which was not the case since a long time) and we can expect that those that used to work with scrapers will search for a good parser.

For my usage, I am pretty sure I can deal with what you already have done. The two things I am truly missing are 1/ the chemical compounds data (InChI, SMILES, etc.) but it is absent from the textfile and might be available through the API, and 2/ some logs about the parsing process to check that things went well.

Midnighter commented 6 years ago

Is scraping the web content or using the API for all enzymes automatically against the terms of use? It would be much easier to get to all the content, I think, and I'd be happy to work on that.

bdelepine commented 6 years ago

Well, scraping is always considered as an "impolite move" to say the least, especially when there is an API, and even more when the data is freely available for download in a textfile :confused:

That being said, I was actually working on a scraper when they released the v18.1... hence my interest for your parser. (The API is not complete: I needed the InChIKey data. Which is still absent from the textfile by the way.)

Their latest terms are CC-BY-ND so we can do whatever we want as long as we do not release their data.

If you are willing to build a Python package to gather all BRENDA data, I would be happy to contribute.

Midnighter commented 6 years ago

I have created a Gitter chat room that you can easily log in to using your GitHub account. Let's chat more there.

Taktubo commented 5 years ago

I am also very interested in this API and its use in parsing BRENDA text files into easily manipulable form. Have there been any updates? I am re-learning python (main language i in R) and am having a bit of trouble using your script. Would you mind updating the README for better instructions on how to download and use the script? A personal message would also work as I am the only person working on my project at the moment.

Midnighter commented 5 years ago

Hi @Taktubo, thank you for your interest. I have put further development on hold because it seemed that my work was redundant. For example, a lot of the BRENDA content is now contained in http://sabio.villa-bosch.de/ Please let me know if that contains what you are looking for. If it does not, I'm happy to pick up work on this again.

Taktubo commented 5 years ago

@Midnighter I am sorry to hear that you believe your endeavors were redundant, and I would like to extend my appreciation for your work thus far. I found your "brenda-parsing" script to be the most helpful tool, even if it wasnt complete because it gave me a great head-start on collecting the data from BRENDA. I also really appreciate the recommendation to use sabio. In some ways it is more comprehensive and easier to use, however, I believe sabio to still have missing information that BRENDA does have like sequencing data.

bdelepine commented 5 years ago

Hello @Midnighter and @Taktubo, Have you seen a complete comparison of SABIO vs. BRENDA out there? Do we have any idea of the overlap between the two? :confused:

From SABIO-RK's paper (PMC5753344):

A comparable database providing kinetic parameters is BRENDA (7). In contrast to SABIO-RK, the information in BRENDA is centred on enzymes and their kinetic constants, whereas SABIO-RK focuses on reactions and additionally, beside constants, offers the associated kinetic rate laws, formulas and experimental conditions.

Midnighter commented 5 years ago

I'm not aware of any such comparison sadly. Maybe @BenjaSanchez, @joaocardoso, or @jonrkarr know more about this topic?

jonrkarr commented 5 years ago

I wasn't aware of the new download and haven't looked into it. Consequently, we haven't done a detailed comparison of the two databases.

We've focused on SABIO-RK for several reasons

BRENDA seems more difficult to reconstruct with scraping.
Because BRENDA (at least the website) relies on the EC classification, it has less detail about the specific reactions catalyzed by each enzyme
Without seeing the underlying data model for BRENDA, it has never been clear to me if BRENDA has joint information about kinetic parameters (e.g. pairs of measurements of Vmax, Km)

However, we haven't found SABIO-RK particularly easy to use either.

BenjaSanchez commented 5 years ago

@Midnighter I also don't know of any comparison out there. Have always used BRENDA because afaik it has a larger coverage than SABIO-RK.

Midnighter / BRENDA-Parser

Updates for v18.1? #4