CDLUC3 / dmptool

DMPTool version of the DMPRoadmap codebase
https://dmptool.org
MIT License
58 stars 13 forks source link

[Question] re: API documentation | retrieving JSON or non-PDF formats #389

Closed TSSlade closed 1 year ago

TSSlade commented 2 years ago

Per the API Fetch DMP documentation, it is possible to access the DMP's JSON metadata or retrieve the PDF.

Retrieving the PDF returns a binary file:

~/Documents/projects/dmp-tool$ http https://dmptool.org/api/v2/plans/83794.pdf Authorization:"Bearer $DMPTOOL_ACCESS_TOKEN"
HTTP/1.1 200 OK
Content-Disposition: inline; filename="Test_for_HEAL.pdf"

==other headers elided==

+-----------------------------------------+
| NOTE: binary data not shown in terminal |
+-----------------------------------------+

Is it possible to retrieve the contents of the DMP in non-PDF format via the API?

The Download function provided via the user interface includes several options, some of which actually return the contents of the complete DMP. (The download-as-JSON option still only provides metadata, however.)

image

briri commented 2 years ago

Hi @TSSlade it is not at the moment. We've considered different ways of including it in the JSON output but the RDA common metadata standard for DMPs doesn't align nicely with our current data model.

What its your use case? Do you need the DMP narrative in a structured format, or just a dump of the text (similar to the text file download in your screenshot)?

TSSlade commented 2 years ago

@briri - the use case is basically parsing the responses to kick off a workflow on our end. Picture something like

"They listed 'intended repository' as 'unknown'. Let's reach out and offer to talk through their options with them."

A JSON format for the content of the DMP (as opposed to the metadata about the DMP) would be ideal. In a pinch we could make do with the text output (slight hassle) or existing PDF (likely a bigger hassle, depending on how it's getting built behind the scenes), but the effort required to build those parsers might be roughly the same as supporting a native JSON export of the content.

Either way it could be a public resource, it's just a matter of hackiness and integration with the broader effort.

briri commented 2 years ago

thanks for the explanation. The JSON standard we're using has a location to convey some of the narrative elements (e.g. security and privacy statements) but not for everything and they are hard to accurately map since each DMP template has varying structures.

Would be willing to submit a PR that enables this functionality?

The DB structure to get at this information is a bit complicated:

It would make sense to include the phase/section titles and descriptions to help group the questions/answers appropriately. I think something like:

{ 
  "dmp": {

    /* Other DMP metadata */

    "dmproadmap_narrative": [
      "phases": [
        {
          "title": "Write plan",
          "sections": [
            {
              "title": "Date Preservation",
              "position": 1,
              "description": "Lorem ipsum",
              "questions": [
                {
                  "question": "What is your favorite color?",
                  "position": 1,
                  "answer": "Green"
                }
              ]
            }
          ]
        }
      ]
   }
}
TSSlade commented 2 years ago

@briri - in principle, yes, I think we'd be open to working on a PR for that. Let me get it verified with the Powers That Be. We might have to sequence it after the current PR that's in progress, but if you're okay with that, I think we'd be okay with that.

briri commented 2 years ago

Yes, that would be fine and makes sense. thanks @TSSlade

mariapraetzellis commented 2 years ago

@TSSlade Is the PR currently in progress and mentioned in the comment above related to the repository selector feature?

TSSlade commented 2 years ago

@mariapraetzellis - the PR is not currently in progress. My question to @briri was more in the vein of "If I were to find someone who could work on this, would it be accepted? Or would it be misaligned with your interests/priorities/things you're willing to support?"

If the latter, I wouldn't bother trying to identify someone to work on it.

The repo selector feature is wholly separate and remains ongoing.