Open lparsons opened 2 years ago
probably duplicate of #226
@lparsons - would you take a look at the design I entered and approve if OK?
I'm not sure that requiring the extra option --no-hmdb-ok
will be all that helpful. It will need to be supplied if even one of the compounds being loaded has no HMDB. If we load a comprehensive list, then it will always need to be included. Personally, I would prefer to skip the option and instead report a summary of compounds loaded that includes the number of compounds with no HMDB id loaded (using a warning color to make it obvious).
@jcmatese Were you able to get an HMDB ID for the experimental compound reported?
@mneinast Do you have a sense of how important supporting compounds without HMDB IDs will be?
I'm not sure that requiring the extra option
--no-hmdb-ok
will be all that helpful.
The point is to force the user performing the load to explicitly say, "yes, I know that there is a missing HMDB ID, and I choose to proceed". I thought that was kind of the point. But if you think a warning is sufficient, then that's fine by me.
It will need to be supplied if even one of the compounds being loaded has no HMDB.
Yes, that is what is meant under Limitations by:
- The hidden option will not be able to be applied to specific records
If we load a comprehensive list, then it will always need to be included.
Yeah, that's a good point.
Personally, I would prefer to skip the option and instead report a summary of compounds loaded that includes the number of compounds with no HMDB id loaded (using a warning color to make it obvious).
I'll edit the design to do this.
@jcmatese Were you able to get an HMDB ID for the experimental compound reported?
@mneinast Do you have a sense of how important supporting compounds without HMDB IDs will be?
@lparsons - OK, ready for re-review
The point is to force the user performing the load to explicitly say, "yes, I know that there is a missing HMDB ID, and I choose to proceed". I thought that was kind of the point. But if you think a warning is sufficient, then that's fine by me.
That does make some sense. I'm honestly not sure how helpful it will be in practice. I do worry a bit that once we allow compounds without any external ID, there won't be enough of an incentive to figure one out and some will get left blank even though they could be filled in. That said, I don't believe we actually have an example compound at the moment.
This is a somewhat major change and is no longer pressing (I think), so let's also get thoughts from @fkang-pu, @jcmatese, and @mneinast.
The point is to force the user performing the load to explicitly say, "yes, I know that there is a missing HMDB ID, and I choose to proceed". I thought that was kind of the point. But if you think a warning is sufficient, then that's fine by me.
That does make some sense. I'm honestly not sure how helpful it will be in practice. I do worry a bit that once we allow compounds without any external ID, there won't be enough of an incentive to figure one out and some will get left blank even though they could be filled in. That said, I don't believe we actually have an example compound at the moment.
This is a somewhat major change and is no longer pressing (I think), so let's also get thoughts from @fkang-pu, @jcmatese, and @mneinast.
We don't do this currently, but if the validation page was made to run load_study instead of the sample and accucor scripts (i.e. it runs the load compounds script), then it would be more useful there... for our internal processing, we would probably, as you pointed out, always have to have that option present. So perhaps instead of a --no-hmdb-ok
, we could have an inverse option to "error on missing HMDB ID" that could be used on the validation page... just a thought. Otherwise, it would be a warning.
This is [...] no longer pressing (I think)
If that's the case, should we change the required
label to optional
?
If that's the case, should we change the
required
label tooptional
?
Sure, was waiting to hear from John.
Alright, so since this isn't required, I will take myself off because I will focus on required issues, but WRT the design, are we good? Perhaps we should have a task:design-approved
. In fact, maybe all the task labels should have a separate design label category, with:
design:1-needs-proposal
design:2-needs-review
design:3-approved
(maybe also "changes-requested"?)Alright, so since this isn't required, I will take myself off because I will focus on required issues, but WRT the design, are we good?
I think the design makes sense to me for now, but if/when we run into a real world example, it could affect how we want to handle things. For instance, do we want to require some external identifier, just not limited to HMDB?
FWIW, IMO, that speaks to a larger question. While the creation of this issue may have been motivated by a larger concern, it seems out of scope, so what I would do is create another issue that addresses the requirement of external identifiers and then list that issue as a dependency on this issue.
FEATURE REQUEST
Inspiration
Some compounds being studied are novel / proprietary, and do not have HMDB IDs. An example is the drug Alpelisib (see RT #49347 )
Description
One simple solution would be to allow the HMDB field to be blank / null. If this is done, we should endeavor to require HMDB IDs whenever possible to help ensure consistency.
Alternatives
Using other identifiers in addition to HMDB could help, though there is no guarantee that all compounds will have such an ID (see #226).
Dependencies
Comment
ISSUE OWNER SECTION
Assumptions
none
Requirements
Limitations
Affected Components
DataRepo/utils/compounds_loader.py
DataRepo/models/compound.py
DESIGN
Interface Change description
Warnings will be printed for all records missing an HMDB ID.
Code Change Description
The hmdb_id field will be made null=True and blank=True. I will buffer records with missing HMDB IDs (when the option isn't there) and issue a warning if the buffer is populated at the end.
Tests