Princeton-LSI-ResearchComputing / tracebase

Mouse Metabolite Tracing Data Repository for the Rabinowitz Lab
MIT License
4 stars 1 forks source link

Allow HMDB ID to be optional for compounds #568

Open lparsons opened 2 years ago

lparsons commented 2 years ago

FEATURE REQUEST

Inspiration

Some compounds being studied are novel / proprietary, and do not have HMDB IDs. An example is the drug Alpelisib (see RT #49347 )

Description

One simple solution would be to allow the HMDB field to be blank / null. If this is done, we should endeavor to require HMDB IDs whenever possible to help ensure consistency.

Alternatives

Using other identifiers in addition to HMDB could help, though there is no guarantee that all compounds will have such an ID (see #226).

Dependencies

Comment


ISSUE OWNER SECTION

Assumptions

none

Requirements

Limitations

Affected Components

DESIGN

Interface Change description

Warnings will be printed for all records missing an HMDB ID.

Code Change Description

The hmdb_id field will be made null=True and blank=True. I will buffer records with missing HMDB IDs (when the option isn't there) and issue a warning if the buffer is populated at the end.

Tests

jcmatese commented 2 years ago

probably duplicate of #226

hepcat72 commented 2 years ago

@lparsons - would you take a look at the design I entered and approve if OK?

lparsons commented 2 years ago

I'm not sure that requiring the extra option --no-hmdb-ok will be all that helpful. It will need to be supplied if even one of the compounds being loaded has no HMDB. If we load a comprehensive list, then it will always need to be included. Personally, I would prefer to skip the option and instead report a summary of compounds loaded that includes the number of compounds with no HMDB id loaded (using a warning color to make it obvious).

@jcmatese Were you able to get an HMDB ID for the experimental compound reported?

@mneinast Do you have a sense of how important supporting compounds without HMDB IDs will be?

hepcat72 commented 2 years ago

I'm not sure that requiring the extra option --no-hmdb-ok will be all that helpful.

The point is to force the user performing the load to explicitly say, "yes, I know that there is a missing HMDB ID, and I choose to proceed". I thought that was kind of the point. But if you think a warning is sufficient, then that's fine by me.

It will need to be supplied if even one of the compounds being loaded has no HMDB.

Yes, that is what is meant under Limitations by:

  • The hidden option will not be able to be applied to specific records

If we load a comprehensive list, then it will always need to be included.

Yeah, that's a good point.

Personally, I would prefer to skip the option and instead report a summary of compounds loaded that includes the number of compounds with no HMDB id loaded (using a warning color to make it obvious).

I'll edit the design to do this.

@jcmatese Were you able to get an HMDB ID for the experimental compound reported?

@mneinast Do you have a sense of how important supporting compounds without HMDB IDs will be?

hepcat72 commented 2 years ago

@lparsons - OK, ready for re-review

lparsons commented 2 years ago

The point is to force the user performing the load to explicitly say, "yes, I know that there is a missing HMDB ID, and I choose to proceed". I thought that was kind of the point. But if you think a warning is sufficient, then that's fine by me.

That does make some sense. I'm honestly not sure how helpful it will be in practice. I do worry a bit that once we allow compounds without any external ID, there won't be enough of an incentive to figure one out and some will get left blank even though they could be filled in. That said, I don't believe we actually have an example compound at the moment.

This is a somewhat major change and is no longer pressing (I think), so let's also get thoughts from @fkang-pu, @jcmatese, and @mneinast.

hepcat72 commented 2 years ago

The point is to force the user performing the load to explicitly say, "yes, I know that there is a missing HMDB ID, and I choose to proceed". I thought that was kind of the point. But if you think a warning is sufficient, then that's fine by me.

That does make some sense. I'm honestly not sure how helpful it will be in practice. I do worry a bit that once we allow compounds without any external ID, there won't be enough of an incentive to figure one out and some will get left blank even though they could be filled in. That said, I don't believe we actually have an example compound at the moment.

This is a somewhat major change and is no longer pressing (I think), so let's also get thoughts from @fkang-pu, @jcmatese, and @mneinast.

We don't do this currently, but if the validation page was made to run load_study instead of the sample and accucor scripts (i.e. it runs the load compounds script), then it would be more useful there... for our internal processing, we would probably, as you pointed out, always have to have that option present. So perhaps instead of a --no-hmdb-ok, we could have an inverse option to "error on missing HMDB ID" that could be used on the validation page... just a thought. Otherwise, it would be a warning.

jcmatese commented 2 years ago

I was able to find a HMDB ID

hepcat72 commented 2 years ago

This is [...] no longer pressing (I think)

If that's the case, should we change the required label to optional?

lparsons commented 2 years ago

If that's the case, should we change the required label to optional?

Sure, was waiting to hear from John.

hepcat72 commented 2 years ago

Alright, so since this isn't required, I will take myself off because I will focus on required issues, but WRT the design, are we good? Perhaps we should have a task:design-approved. In fact, maybe all the task labels should have a separate design label category, with:

lparsons commented 2 years ago

Alright, so since this isn't required, I will take myself off because I will focus on required issues, but WRT the design, are we good?

I think the design makes sense to me for now, but if/when we run into a real world example, it could affect how we want to handle things. For instance, do we want to require some external identifier, just not limited to HMDB?

hepcat72 commented 2 years ago

FWIW, IMO, that speaks to a larger question. While the creation of this issue may have been motivated by a larger concern, it seems out of scope, so what I would do is create another issue that addresses the requirement of external identifiers and then list that issue as a dependency on this issue.