MaastrichtU-IDS / xml2rdf

📄 A simple XML to RDF converter
http://d2s.semanticscience.org/
MIT License
4 stars 3 forks source link

Issue processing values with included html tags #19

Closed vemonet closed 5 years ago

vemonet commented 5 years ago

Some XML tags value have html tags included.

e.g. PubMed: when we get AbstractText value I am getting all text included within the <AbstractText/> tags, except for everything included in the children tags (<i>, <sup>).

We should get everything included within the <AbstractText/> tags, including children tags and their value

<Abstract>
  <AbstractText>An analytical method based on ultra-performance liquid chromatography with positive ion electrospray ionization (ESI) coupled with tandem mass spectrometry detection (UPLC-MS/MS) was developed and validated for the determination of therapeutic peptide desmopressin in human plasma. A desmopressin stable labeled isotope (desmopressin dsub8/sub) was used as an internal standard. Analyte and the internal standard were extracted from 200 µL of human plasma ivia/i solid-phase extraction technique using Oasis WCX cartridges. The chromatographic separation was achieved on an Aquity UPLC HSS T3 column by using a gradient mixture of methanol and 1 mM ammonium formate buffer as the mobile phase. 
The calibration curve obtained was linear (
    <i>r</i>
    <sup>
      <i>2</i>
    </sup> ≥0.99) over the concentration range of 1.01-200 pg/mL. Method validation was performed as per FDA guidelines and the results met the acceptance criteria. The results of the intra- and inter-day precision and accuracy studies were well within the acceptable limits. The proposed method was successfully applied to pharmacokinetic studies in humans.
  </AbstractText>
</Abstract>

See https://ftp.ncbi.nlm.nih.gov/pubmed/sample-2019-01-01/example.xml

Generated RDF on repository test (graph http://test/pubmed):

vemonet commented 5 years ago

Normal behavior, we should go through the child to get the tags content or PubMed should properly encode. We should contact PubMed to ask them either