MassBank / MassBank-web

The web server application and directly connected components for a MassBank web server
14 stars 22 forks source link

[SOLVED] Deprecated records #171

Closed tsufz closed 3 years ago

tsufz commented 5 years ago

@schymane and me chatted a little bit about the handling of deprecated records. We agreed that those records should be tagged and not removed due to historical reasons. We suggest to move those records to a specific deprecated folder and than tagged with a deprecated tag which should be add to the tille like 13a-Hydroxylupanin; LC-ESI-ITFT; MS2; CE: 10%; R=15000; [M+H]+ (deprecated).

The tag could be and placed directly under the accession Deprecated: This recorded was deprecated on date()

Should we give a comment why the record was deprecated (for learning reasons).

meier-rene commented 5 years ago

Hi all, today we talked a bit about a proper way to implement a mechanism for record deprecation:

An option would be to introduce a tag, lets call it '[DEPRECATED]' and put it in the title. Because we want to limit the potential of breaking 3rd party clients we propose the tag like this:

RECORD_TITLE: Bassanolide; LC-ESI-ITFT; MS2; CE: 55; R=17500; [M+Na]+

make this deprecated:

RECORD_TITLE: [DEPRECATED] Bassanolide; LC-ESI-ITFT; MS2; CE: 55; R=17500; [M+Na]+

The first field in the record title is free format and this change will most likely not break 3rd party clients.

Treatment of deprecated records by MassBank: Records marked with '[DEPRECATED]' will not be found by keyword search, because they will not be parsed and written to the database. The only possible operation on deprecated records is a record display of plain text without displaying a spectrum or a chemical structure.

Of course its possible to introduce the reason for deprecation in the COMMENT section.

I would appreciate your opinion...

schymane commented 5 years ago

I like this way of handling it in general ... I am just wondering if we should add another (non-compulsory) tag, so that it is not just in the title, as an alternative / in addition to the reason for deprecation in COMMENT. Reason being is that I for one do not often parse the TITLE field, but I would e.g. parse a "DEPRECATED" field if I want to check that nothing's deprecated ... (otherwise how would we detect it without parsing the title field)? A "DEPRECATED" field could inherently either be empty or contain the reason for deprecation (rather than requiring yet another COMMENT)?

schymane commented 5 years ago

Alternative: make a systematic "COMMENT [DEPRECATED]" recommendation for adding the reasons. But then we may as well add an official tag ... it's also rare to parse the comments because they are free text and rather difficult to process automatically...

Treutler commented 5 years ago

Potentially, deprecated records are not syntactically valid anymore. Hence, we can add every field we want. Possibilities would be fields like

schymane commented 5 years ago

@tsufz also made the point that we could/should document date of deprecation (I see you just added that), I'd suggest date first and prefer this:

schymane commented 5 years ago

We should also add this to the Record Specification. With DEPRECATED: YYYY-MM-DD free text we'd have good flexibility to auto-parse the essential information and leave flexibility for reasons. Optional addition of name/github handle of curator/deprecator?

meier-rene commented 5 years ago

If we add a new field like DEPRECATED: considered noisy (03/05/19) we might break 3rd party software. On the other hand we might prevent 3rd party software to process invalid data with this new tag.

So the question here is: Do we want to force 3rd party software to be aware of our deprecation mechanism?

schymane commented 5 years ago

I guess that could be avoided by COMMENT: DEPRECATED: 2019-05-03 considered noisy I personally still prefer a new tag but would accept either, just need to know as we need to mark some soon ;-)

meowcat commented 5 years ago

Do we want to force 3rd party software to be aware of our deprecation mechanism?

Except for RMassBank, what software will be bothered by this?

Treutler commented 5 years ago

Basically every piece of software which parses MassBank records. I guess that MoNA parses MassBank records from time to time or some scripts of scientists using MassBank records...

tsufz commented 5 years ago

Hi,

  1. I prefer also a dedicated tag. This is the easiest for third party developers to avoid deprecated records.
  2. I would not remove the spectral part in order of data consistency. The reader of a paper must be able to review the old record in order to decide if the message of the paper is reliable or not.
  3. I suggest to move all deprecated records to an own folder.
  4. We should actively advice known third party software maintainers to implement a respective controlling structure in their software (e.g. by announcement at MassBank.eu and by writting issues if the software is OS). It might be also possible to use contacts to the machine vendors to pass the issue to their software developers.
Treutler commented 5 years ago

I agree to 1, 2, and 4.

I suggest to move all deprecated records to an own folder.

I think we should leave the records at the same place. This has the advantage that we can avoid to assign the same accession code two times, the assignment of the records to the contributors is very clear, and it is not necessary to move records. What is the rationale of moving deprecated records to a separate folder?

schymane commented 5 years ago

Agree with @Treutler re all points ....

sneumann commented 5 years ago

Hi, one more thought: one could remove (large) parts of the record, and point to the last git state of the record:

ACCESSION: SMI00034
RECORD_TITLE: Glucolesquerellin; LC-ESI-QTOF; MS2; CE:40 eV;
DATE: 2012.08.31 (Created 2012.08.31)
AUTHORS: S. Neumann: IPB-Halle, Germany & E. Schymanski: Eawag, Switzerland
LICENSE: CC BY
COPYRIGHT: CASMI2012
PUBLICATION: Schymanski, E.; Neumann, S. The Critical Assessment of Small Molecule Identification (CASMI): Challenges and Solutions. Metabolites 2013, 3 (3), 517–38. DOI:10.3390/metabo3030517
COMMENT: http://casmi-contest.org/challenges-cat1-2.shtml
COMMENT: CASMI2012 LC Challenge 3
COMMENT: DEPRECATED: 2019-05-03 considered noisy
COMMENT: SUPERSEEDED: SMI00035
COMMENT: LASTGIT: 71bfc632750600db42864739472d87bc6abd6e47

where MassBank-web would render the git hash to point to https://github.com/MassBank/MassBank-data/blob/71bfc632750600db42864739472d87bc6abd6e47/CASMI_2012/SMI00034.txt The SUPERSEEDED would be only human readable to point to one or more replacement(s).

I also had the idea to distinguish between "DEPRECATED", which should be interpreted as "There are good reasons to not use this record" and "DELETED" which really means "This record is gone. Away. You can time travel if needed.".

Yours, Steffen

schymane commented 5 years ago

I don't think we should replace a record with a new one? Isn't that what versioning is meant to avoid? I also don't quite agree with a git state because, well, we have text files (not just MassBank-web) and I can't auto-interpret that lastgit bit into anything useable as a human ... as Tobias pointed out, even if we decide to deprecate records, they are history and it's good to have the data so that humans can look at them and agree or disagree; ChemSpider caused many issues for us by deprecating structures suddenly ... we should not do the same ...

meier-rene commented 5 years ago

Deprecated records are now implemented into dev branch and rolled out on our dev server. One example: https://msbi.ipb-halle.de/MassBank/RecordDisplay.jsp?id=JEL00034 The documentation is added as well: https://github.com/MassBank/MassBank-web/blob/dev/Documentation/MassBankRecordFormat.md#211-accession I mark this as solved and will close this after the next rollout on MassBank.eu.

tsufz commented 4 years ago

@meier-rene Solved in fbbf5f36385efad85433784a8bc7365dae580cc7 and 78c7f90939949cc6b89eb0839d270c43394622d5?

If yes, we could close.