FAIRClinical / ClinicalCorporaWorkflow

0 stars 0 forks source link

New BioC field #2

Closed Thomas-Rowlands closed 1 month ago

Thomas-Rowlands commented 1 month ago

I propose we add a “textsource” parameter to the BioC output file when we process images only, with the value being the URL of the api service. This parameter goes at the same level (in the documents array) as “inputfile”.

For example, if the text comes from the fetch API (addition highlighted):

"documents": [ { "id": 1, "inputfile": "PMC10021083_supplementary/Raw/sj-jpg-1-tah-10.1177_20406207231155991.jpg", “textsource” : “https://sibils.text-analytics.ch/api/fetch?ids=PMC10021083_sj-jpg-1-tah-10.1177_20406207231155991.jpg&col=suppdata”, "infons": {}, "passages": [ { ….

For example, if the text comes from the OCR API (addition highlighted):

"documents": [
    {
        "id": 1,
        "inputfile": "PMC2365968_supplementary/Raw/1752-1947-2-112-S3.tiff",
“textsource” : "[https://ocrweb.text-analytics.ch/"](https://ocrweb.text-analytics.ch/%22),
        "infons": {},
        "passages": [
            {
            ...