EBIBioSamples / biosamples-v4

The source code for the new version of the EMBL-EBI BioSamples database
https://www.ebi.ac.uk/biosamples/
Apache License 2.0
11 stars 9 forks source link

Issue observed while testing sample validation in BioSamples, Webin-REST tests #728

Closed amnonkhen closed 2 months ago

amnonkhen commented 2 months ago

As part of the preparation to deploy Webin REST in dev which will connect to BioSamples in dev, below issue has been observed:

Sample metadata:

<?xml version = '1.0' encoding = 'UTF-8'?><SAMPLE_SET>
    <SAMPLE alias="" center_name="">
        <TITLE>FTP2_ITS</TITLE>
        <SAMPLE_NAME>
            <TAXON_ID>410658</TAXON_ID>
            <SCIENTIFIC_NAME>soil metagenome</SCIENTIFIC_NAME>
        </SAMPLE_NAME>
        <DESCRIPTION>Fort Tryon Park 2. ITS</DESCRIPTION>
        <SAMPLE_ATTRIBUTES>
            <SAMPLE_ATTRIBUTE>
                <TAG>investigation type</TAG>
                <VALUE>metagenome</VALUE>
            </SAMPLE_ATTRIBUTE>
            <SAMPLE_ATTRIBUTE>
                <TAG>project name</TAG>
                <VALUE>nyc_parks_medians</VALUE>
            </SAMPLE_ATTRIBUTE>
            <SAMPLE_ATTRIBUTE>
                <TAG>sequencing method</TAG>
                <VALUE>Illumina</VALUE>
            </SAMPLE_ATTRIBUTE>
            <SAMPLE_ATTRIBUTE>
                <TAG>collection date</TAG>
                <VALUE>2013-05-31</VALUE>
            </SAMPLE_ATTRIBUTE>
            <SAMPLE_ATTRIBUTE>
                <TAG>environmental package</TAG>
                <VALUE>soil</VALUE>
            </SAMPLE_ATTRIBUTE>
            <SAMPLE_ATTRIBUTE>
                <TAG>geographic location (latitude)</TAG>
                <VALUE>40.86622</VALUE>
                <UNITS>DD</UNITS>
            </SAMPLE_ATTRIBUTE>
            <SAMPLE_ATTRIBUTE>
                <TAG>geographic location (longitude)</TAG>
                <VALUE>-73.93168</VALUE>
                <UNITS>DD</UNITS>
            </SAMPLE_ATTRIBUTE>
            <SAMPLE_ATTRIBUTE>
                <TAG>geographic location (country and/or sea)</TAG>
                <VALUE>USA</VALUE>
            </SAMPLE_ATTRIBUTE>
            <SAMPLE_ATTRIBUTE>
                <TAG>geographic location (depth)</TAG>
                <VALUE>10.00</VALUE>
                <UNITS>m</UNITS>
            </SAMPLE_ATTRIBUTE>
            <SAMPLE_ATTRIBUTE>
                <TAG>soil environmental package</TAG>
                <VALUE>soil</VALUE>
            </SAMPLE_ATTRIBUTE>
            <SAMPLE_ATTRIBUTE>
                <TAG>environment (biome)</TAG>
                <VALUE>ENVO:urban biome</VALUE>
            </SAMPLE_ATTRIBUTE>
            <SAMPLE_ATTRIBUTE>
                <TAG>environment (feature)</TAG>
                <VALUE>soil</VALUE>
            </SAMPLE_ATTRIBUTE>
            <SAMPLE_ATTRIBUTE>
                <TAG>environment (material)</TAG>
                <VALUE>soil</VALUE>
            </SAMPLE_ATTRIBUTE>
            <SAMPLE_ATTRIBUTE>
                <TAG>depth</TAG>
                <VALUE>0.1</VALUE>
                <UNITS>m</UNITS>
            </SAMPLE_ATTRIBUTE>
            <SAMPLE_ATTRIBUTE>
                <TAG>geographic location (elevation)</TAG>
                <VALUE>1</VALUE>
                <UNITS>m</UNITS>
            </SAMPLE_ATTRIBUTE>
            <SAMPLE_ATTRIBUTE>
                <TAG>broad-scale environmental context</TAG>
                <VALUE>soil</VALUE>
            </SAMPLE_ATTRIBUTE>
            <SAMPLE_ATTRIBUTE>
                <TAG>environmental medium</TAG>
                <VALUE>soil</VALUE>
            </SAMPLE_ATTRIBUTE>
            <SAMPLE_ATTRIBUTE>
                <TAG>ENA-CHECKLIST</TAG>
                <VALUE>ERC000022</VALUE>
            </SAMPLE_ATTRIBUTE>
        </SAMPLE_ATTRIBUTES>
    </SAMPLE>
</SAMPLE_SET>

Sample contains:

<SAMPLE_ATTRIBUTE>
                <TAG>geographic location (elevation)</TAG>
                <VALUE>1</VALUE>
                <UNITS>m</UNITS>
            </SAMPLE_ATTRIBUTE>

elevation and geographic location (elevation) are synomyms, see checklist:

<FIELD>
          <LABEL>elevation</LABEL>
          <SYNONYM>geographic location (elevation)</SYNONYM>
          <NAME>elevation</NAME>
          <DESCRIPTION>The elevation of the sampling site as measured by the vertical distance from mean sea level.</DESCRIPTION>
          <UNITS>
            <UNIT>m</UNIT>
          </UNITS>
          <FIELD_TYPE>
            <TEXT_FIELD>
              <REGEX_VALUE>([+-]?(0|((0\.)|([1-9][0-9]*\.?))[0-9]*)([Ee][+-]?[0-9]+)?)|((^not collected$)|(^not provided$)|(^restricted access$)|(^missing: control sample$)|(^missing: sample group$)|(^missing: synthetic construct$)|(^missing: lab stock$)|(^missing: third party data$)|(^missing: data agreement established pre-2023$)|(^missing: endangered species$)|(^missing: human-identifiable$))</REGEX_VALUE>
            </TEXT_FIELD>
          </FIELD_TYPE>
          <MANDATORY>mandatory</MANDATORY>
          <MULTIPLICITY>multiple</MULTIPLICITY>
        </FIELD>

Error in BioSamples:

2:34:07.431 [Test worker] INFO uk.ac.ebi.ena.sra.SRALoader - load processed in 678ms
12:34:07.432 [Test worker] INFO uk.ac.ebi.ena.sra.utils.Common - *|ERROR: 2024_08_21_12_34_05_619__EBI_SUB_SRA_TEST_ALIAS__1 failed validation due to should have required property 'local environmental context'
12:34:07.432 [Test worker] INFO uk.ac.ebi.ena.sra.utils.Common - *|ERROR: 2024_08_21_12_34_05_619__EBI_SUB_SRA_TEST_ALIAS__1 failed validation due to should have required property 'elevation'
12:34:07.432 [Test worker] INFO uk.ac.ebi.ena.sra.utils.Common - *|ERROR: Failed to submit samples to BioSamples
amnonkhen commented 2 months ago

@theisuru @dgupta Is this still relevant?

amnonkhen commented 2 months ago

dupe of ebi-ait/checklist#76