MassBank / MassBank-web

The web server application and directly connected components for a MassBank web server
14 stars 22 forks source link

Replace molfiles upload and gif rendering by CDK rendering of SVGs from records #47

Closed tsufz closed 6 years ago

tsufz commented 7 years ago

Exchange the GIF view by SVG, can be rendered using OpenBabel (on system anyway). Needs also changes in /mbadmin/StructureList.jsp etc.

tsufz commented 7 years ago

I suggest to use CDK in the back end to render the records from the records (smiles or specific CDK tags) and store the svg (to replace molfile upload and gif rendering).

tsufz commented 7 years ago

We need a parsing tool which processes the structure files from the records automatically. This issue is related to discussions in #10 and #12. Merged from #36 Schymane:

We cannot produce structures from InChIKeys, but rather from the SMILES or InChI fields, both of which are compulsory.
If these are “NA” or similar, then the structural information doesn’t exist and thus we can’t plot a structure anyway.
The idea of working off SMILES would be that we may also be able to render from generic SMILES …
schymane commented 7 years ago

We should be careful not to mix non-standard SMILES in the official field, as these cannot be processed by all toolkits and will affect other users downstream. Thus we should probably consider to parse structures from CH$SMILES and if this does not exist, nothing is displayed in the main window. What @tsufz and I will test out is whether we can add some special cases in CH$LINK fields, so that people can display less-well-defined structures if they chose via a link, e.g. something like:

CH$LINK: CDK_DEPICT_SMILES

CH$LINK: CDK_GENERIC_SMILES

CH$LINK: CDK_SUBSTRUCTURE_SMILES

tsufz commented 7 years ago

I see no problems so far the non-existing smiles are correctly tagged with n/a or so. Needs to be checked and curated if necessary. If the smiles entry is empty, then the script searches for the other predefined tags and renders from here.

uchem-massbank commented 7 years ago

I don’t think you can upload without CH$SMILES: N/A. From the data dump I did yesterday 667 records have SMILES N/A in the official “OpenData” collection (three JEL, rest JP entries), then there are our tentative special cases where the SMILES really doesn’t exist. The 667 would need to be curated at some point (e.g. Name is available, but all caps), but for the Eawag tentative spectra, the order you propose could work … if CH$SMILES is N/A, then look for CH$LINK: CDK_ …. But I would of course have to update the records to contain those fields before that would work … :-)

tsufz commented 7 years ago

That's actually the point, we need curation of records. At least to standardise such records with different presentation as mentioned by @uchem-massbank.

schymane commented 7 years ago

uchem-massbank = schymane. It’s random which notification I get and meowcat is on holidays… Just keeping you all on your toes… :-)

Treutler commented 7 years ago

started the replacement of GIF images by SVG images in 388db5bf958b5251a6205a51a876cac9270fb4b1

tsufz commented 7 years ago

Hi, when I click on the structure to enlarge, an empty pop up is shown. The reason is that the port 8080 is incorrect: http://localhost:8080/MassBank/temp/170705_180554_230_BML00743.svg

Should be http://localhost:80/MassBank/temp/170705_180554_230_BML00743.svg which works

tsufz commented 7 years ago

Another question, are the svg created once and then reused or are they rendered always in runtime? In the last case, we should consider regularly clean up of the temp folder (e.g. once a day).

Treutler commented 7 years ago

the svg files are created on runtime in the folder path.tomcat.app.temp=/Massbank/.metadata/.plugins/org.eclipse.wst.server.core/tmp0/wtpwebapps/MassBank/temp/ which is given by MassBankEnv.get(MassBankEnv.KEY_TOMCAT_APPTEMP_PATH); A daily cleanup of the folder is advisable

tsufz commented 7 years ago

Last comment solved with commit https://github.com/MassBank/MassBank-web/commit/79fcbf97deac515fa17fa682840483d2c4d388ba

tsufz commented 7 years ago

Emma wishes to have an structure viewer in mbadmin which replaces the old molfile viewer to see what is rendered actually. Should be relatively simply to implement.

tsufz commented 7 years ago

Molfile registration is not working properly. I suggest to work with high priority on the SVG implementation and to deprecate molfiles as structure sources,

HTTP Status 500 - An exception occurred processing JSP page /mbadmin/StructureRegist.jsp at line 355

type Exception report

message An exception occurred processing JSP page /mbadmin/StructureRegist.jsp at line 355

description The server encountered an internal error that prevented it from fulfilling this request.

exception

org.apache.jasper.JasperException: An exception occurred processing JSP page /mbadmin/StructureRegist.jsp at line 355

352: String masterName = ""; 353: File masterFileName = null; 354: File registFileName = null; 355: int startId = Integer.parseInt(registInfo[1]); 356: int nextId = startId; 357: boolean isMolRegist = (dataPath.indexOf(MOLDATA_DIR_NAME) != -1); 358: ArrayList copiedFiles = new ArrayList();

Stacktrace: org.apache.jasper.servlet.JspServletWrapper.handleJspException(JspServletWrapper.java:578) org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:476) org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:401) org.apache.jasper.servlet.JspServlet.service(JspServlet.java:345) javax.servlet.http.HttpServlet.service(HttpServlet.java:729) org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)

root cause

java.lang.NumberFormatException: For input string: "1.0" java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) java.lang.Integer.parseInt(Integer.java:580) java.lang.Integer.parseInt(Integer.java:615) org.apache.jsp.mbadmin.StructureRegist_jsp.regist(StructureRegist_jsp.java:345) org.apache.jsp.mbadmin.StructureRegist_jsp._jspService(StructureRegist_jsp.java:1136) org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) javax.servlet.http.HttpServlet.service(HttpServlet.java:729) org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:438) org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:401) org.apache.jasper.servlet.JspServlet.service(JspServlet.java:345) javax.servlet.http.HttpServlet.service(HttpServlet.java:729) org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)

note The full stack trace of the root cause is available in the Apache Tomcat/8.0.32 (Ubuntu) logs. Apache Tomcat/8.0.32 (Ubuntu)

schymane commented 7 years ago

Does this apply to the current MassBank.EU upload? I have not had issues on our local MassBank, nor did I have any issues with the latest records I uploaded, but I think for this case I did not change any structures… Please let me know.

tsufz commented 7 years ago

The current server is not affected. The new server will be affected! However, no way out to move to the new server next week. @Treutler and @meier-rene , maybe there is an easy way to fix the bug? This problem looks a little bit similar to the problem we had before with record registration (only in case if more than one DB is existing). See #31. I will open a new issue.

uchem-massbank commented 7 years ago

Does that mean I should consider upload BEFORE the move to the new server? I could … but then quite a few of the DTXSIDs just won’t be active until the data release end of July … but we could either accept this, or just deactivate the hyperlinking until everything is live. The majority of substances were already registered.

I would have most libraries ready either today or very early next week/over the weekend if that’s the case. I have one I could test right now where only one DTXSID is not yet “live”. Let me know.

tsufz commented 7 years ago

No, it is impossible to register molfiles on the new server if the problem in #72 is not fixed. However, I would like to put the efforts into the full implementation of SVG rendering from SMILES or CDK-depicts in records in all views. It is implemented in the results.jsp displaying the lists of records, but not yet in the records view. @Treutler what is the time frame for full implementation? I like to get rid of the molfiles.

schymane commented 7 years ago

Hence my question: should I upload before? When do you upgrade the server? What is that timing? I know you want to get rid of the mol files … but we need to make sure that the JP records will also work and some structures and names will change! This is pretty important to get right. If I am not able to replace or even delete the old mol files things will be an awful mess because everything will be inconsistent. Long term of course we should try to get rid of them, but we need to get the transition right. I can also wait with upload til end of July, when the data is live. Does this give us enough time – also to implement on JP side?

tsufz commented 7 years ago

We will prepare the new server from Monday. Hence, you should upload the new stuff during the weekend. Consider that if you change DB names, they will not be replicated to massbank.jp because we need to have the mirror structure there, for example UFZ on massbank.eu -> UFZ_backup on massbank.jp. The Eawag DB is not replicated anyway, because it Eawag on massbank.eu and Eawag on massbank.jp. Hence, you need to send your new records to Takaaki-san anyway.

Treutler commented 7 years ago

The full support of the structure-rendering by SVGs is almost complete. In addition, I would like to test it a bit more thoroughly. I think I can push it on monday. In this case, we can render structures from InChI/SMILES. If there is no InChI or SMILES in the record, then we have no structure-image-SVG, because molfiles are not used anymore for this.

tsufz commented 7 years ago

Ah, great. Hence, I could roll out with new server. Do you include the rendering from CDK-tags as well? This would be great because we can use substructures, Markush (?), skeletons with unknown functions etc. (see Record format (https://github.com/MassBank/MassBank-web/blob/master/modules/apache/html/MassBank/manuals/MassBankRecordFormat_en.pdf) section 2.2.7. Btw. do we need different tags for the handling or could be just send as it is to CDK?

schymane commented 7 years ago

Those entries can be sent “as is” to CDK, but we should test it properly. I think the decision at the moment to only display structures if there is a CH$SMILES or CH$IUPAC (InChI) field is the right way to start and we can cover the other cases later? The substructure marking on a full SMILES code will need to be implemented with a SMILES and SMARTS combination which is not currently possible.

Treutler commented 7 years ago

in the moment the fields CH$SMILES and CH$IUPAC are queried to get the structure. I did not consider the fields CH$LINK because CH$SMILES and CH$IUPAC are mandatory fields. But in principle we can also implement the display of (sub)structures from CH$LINK in the next days if this is desirable.

uchem-massbank commented 7 years ago

Would we then have to define a priority order which one is taken in which order? So far only ~4 records have any form of SMILES in a CH$LINK field… CH$SMILES first, then CH$IUPAC, then CH$LINK …?

tsufz commented 7 years ago

I would appreciate this. @Treutler I did sent you those two G.... records which includes the CDK stuff. However, specific tagging is only necessary if we use the Disp.cgi to link to external servers in order to render correct URLs.

Now, we are able to use internal CDK to render and hence, those specific tags are obsolete. I would suggest to deploy an optional CH$CDK tag to map the CDK replacing the CH$LINK tags because we don't link anymore but use information from the record.

The rank would be CH$SMILES, CH$IUPAC, CH$CDK; to override mandatory tags, we use n/a for replacement (mandatory tags are checked during upload).

schymane commented 7 years ago

What happens when we want multiple entries? This is why I wanted the CH$LINK option ... because this lets us save more than one and indicate what they are. A CH$CDK field is (a) not indicative of what the entry is (CDK is the tool kit and not the identifier) and (b) won't let me save multiple options. There are different styles: they are rendered/depicted with the same function, but their validity is different.

Regarding upload time, nothing coming from my side before the DTXSID entries are live on the Dashboard, which is early August.

tsufz commented 7 years ago

Well, then we create first level CH descriptors such as CH$CDK_DEPICT_SMILES. CH$LINK indicates an link to an external database, but we render real time on the server.

Regarding the upload, no problem. The links are alive and you can upload everytime after the new server is running and stable.

Treutler commented 7 years ago

Replaced the usage of gif images and molfiles by svg images by commit eb8dd93683d7993afb266c3f493ff9a865b0b79a. The tags CH$SMILES and CH$IUPAC are used to query the SMILES/InChI (in this order of priority). The tag CH$CDK is not used currently. To be honest: I am not sure, what exactly is the relation of the discussion about the tag CH$LINK with this issue. Please clarify.

tsufz commented 7 years ago

The CDK_DEPICT tags are used to render particularly structures such as c1ccc(cc1)/C=C/C(=O)O[R]. At least in the record view, they should be rendered and shown (if there are different tags iteratively in different tiles). I created now the CH$CDK_DEPICT tag (see commit 22792c4d54d56bee4cbb5360afdff729661590d1) in section 2.2.7 in order to exemplify. The CH$LINK is only used to link to external databases (e.g. ChemSpider) and not for internal processing.

tsufz commented 7 years ago

Something is wrong with https://github.com/MassBank/MassBank-web/commit/eb8dd93683d7993afb266c3f493ff9a865b0b79a. When I create a new DB, then the massbank.conf is changed and a new DB is created in /mariadb. But in the recordindex, the new database is not shown!

If I change http://192.168.35.18/MassBank/jsp/Result.jsp?type=rcdidx&idxtype=site&srchkey=0&sortKey=name&sortAction=1&pageNo=1&exec= to http://192.168.35.18/MassBank/jsp/Result.jsp?type=rcdidx&idxtype=site&srchkey=1&sortKey=name&sortAction=1&pageNo=1&exec=, then the data from the new DB are listed.

tsufz commented 7 years ago

Another topic. Were the paths to the temp folders changed? Should be /var/lib/tomcat8/webapps/MassBank/temp. I have some problems if I just copy the new webappp to an existing server. The mbadmin does not work because obviously it cannot find the temp folder and deploy it's created temp files....

Treutler commented 7 years ago

The problem that a second DB is not shown in the RecordIndex.jsp is solved by 3d996ee9aad8c16957f373e06ea792fe6a99861f

Treutler commented 7 years ago

Prepared the SVG rendering of CH$CDK_DEPICT_SMILES, CH$CDK_DEPICT_GENERIC_SMILES, and CH$CDK_DEPICT_STRUCTURE_SMILES in RecordDisplay.jsp in commit c212de0fc8928ae519cd712f2f20afa105aae1e4. These tags need to be implemented in #9 and will work afterwards. Hence, this issue closes with #9.

Did I miss anything?

uchem-massbank commented 7 years ago

Thanks! Has the record index been updated? Or is this part of #9?

Treutler commented 7 years ago

Right, the implementation of new record fields is part of #9. This includes the extension of the database schema and the upload, query, and usage of this data from the database in all parts of MassBank.

tsufz commented 7 years ago

I don't see the CDK_DEPICT stuff after upload of the test spectra. The setting

GP055811.txt CH$CDK_DEPICT_STRUCTURE_SMILES: OCC(O)C(O)C(O)C=O

GP055812.txt: CH$CDK_DEPICT_GENERIC_SMILES: c1ccc(cc1)/C=C/C(=O)O[R] CH$CDK_DEPICT_STRUCTURE_SMILES: c1ccc(cc1)/C=C/C(=O)O

@Treutler

Treutler commented 7 years ago

These tags need to be implemented in #9 and will work afterwards. I hope this delay is okay.

tsufz commented 7 years ago

@Treutler : Is there any reason to listen on localhost:8080 (see Listener.java)? If this is needed for internal issues, okay. For external display, this URL is not working (only on port 80).

The deployment of the enlarged SVG should encoded with normal server URL (e.g. ServerURL/temp/...). If not, the enlarged structures cannot be shown (see comment above).

I don't like to open another port or build another virtual host.

Treutler commented 7 years ago

Moved the svg temp path to from localhost:8080 to localhost in commit 4ff3479c96b1dcc215e368d0e9c60e78b6ddbd6e.

tsufz commented 7 years ago

Close with #9

Treutler commented 6 years ago

Rendering of structures is done. If there are open points here, please open a new issue.