We are not currently persisting developmental_stage to the sample in the database. Per the docs we are supposed to be harmonizing this key as refinebio_developmental_stage. This has temporarily been removed from the docs at this time. We do have the functionality of parsing this from the Sample and already to do this in the harmonizer and assign it to the model but on save this value is no persisted to the database.
Problem or idea
Adding the field is pretty trivial, we just need to add something like developmental_stage = models.CharField(max_length=255, blank=True)
The harder part will be backfilling the existing samples and / or possibly just rerunning the harmonizer on all samples in some fashion. This is not computationally difficult but in order to be good citizens we should determine the ideal self imposed rate limit so that we can both accomplish this in a reasonable amount of time and not thrash ENA api endpoints. At this time I have been unable to determine a way to fetch multiple biosample responses in a single query that are tied to a specific study. So while we can fetch an entire experiment's worth of sample metadata we still need to fetch biosample metadata one at a time.
Solution or next step
Add the developmental_stage attribute to sample
Add developmental_stage to serializers that contain scientific keys
Add refiebio_developmental_stage to the Sample.to_metadata_dict method. (this will add it to downloaded metadata.json`
Determine steps required to backfill existing samples to populate this attribute.
Context
We are not currently persisting
developmental_stage
to the sample in the database. Per the docs we are supposed to be harmonizing this key asrefinebio_developmental_stage
. This has temporarily been removed from the docs at this time. We do have the functionality of parsing this from the Sample and already to do this in the harmonizer and assign it to the model but on save this value is no persisted to the database.Problem or idea
Adding the field is pretty trivial, we just need to add something like
developmental_stage = models.CharField(max_length=255, blank=True)
The harder part will be backfilling the existing samples and / or possibly just rerunning the harmonizer on all samples in some fashion. This is not computationally difficult but in order to be good citizens we should determine the ideal self imposed rate limit so that we can both accomplish this in a reasonable amount of time and not thrash ENA api endpoints. At this time I have been unable to determine a way to fetch multiple biosample responses in a single query that are tied to a specific study. So while we can fetch an entire experiment's worth of sample metadata we still need to fetch biosample metadata one at a time.
Solution or next step
developmental_stage
attribute to sampledevelopmental_stage
to serializers that contain scientific keysrefiebio_developmental_stage
to theSample.to_metadata_dict
method. (this will add it to downloaded metadata.json`