IQSS / dataverse.harvard.edu

Custom code for dataverse.harvard.edu and an issue tracker for the IQSS Dataverse team's operational work, for better tracking on https://github.com/orgs/IQSS/projects/34
5 stars 1 forks source link

Investigate error when trying to create dataset links for Dean Karlan's datasets #274

Open jggautier opened 5 months ago

jggautier commented 5 months ago

I used the Dataverse API to try to add links of datasets on Harvard Dataverse authored by Dean Karlan into two collections and the API returned a 403 error for a number of them.

3 dataset links not added to deankarlan collection In mid-April 2024, I used the Dataverse API to try to add 54 dataset links into the collection at https://dataverse.harvard.edu/dataverse/deankarlan. Links for 51 of those dataset were created.

I can't add links for three of those datasets. When I try, the API returns a 403 error. In the datasetlinkingdataverse table in the Harvard Dataverse database, I can see the IDs of all 54 datasets, including the three that don't appear as links in the UI. Maybe that's related to the 403 error?

The three datasets that I couldn't add as links into the collection are:

12 dataset links not added to DFEEP collection In early June 2024, I used the Dataverse API to try to add 12 dataset links to the collection at https://dataverse.harvard.edu/dataverse/DFEEP. The API returned 403 errors and the links were not added.

Here are the DOIs for those 12 datasets:

More context In the email thread at https://help.hmdc.harvard.edu/Ticket/Display.html?id=359122, Dean Karlan and I discussed how to make sure that the datasets he's an author of are linked into his collection and linked into the DFEEP collection, including the 54 datasets published so far and any datasets published in the future.

I also used the Saved Search feature to add links of any datasets he's an author of that are published in the future (see https://github.com/IQSS/dataverse.harvard.edu/issues/275 and https://github.com/IQSS/dataverse.harvard.edu/issues/277).

jggautier commented 5 months ago

I was able to use the Saved Search feature so that all Dean Karlan-authored datasets are added as links into his collection and the DFEEP collections. @sbarbosadataverse , @scolapasta and I wondered if Dataverse would then add links for the datasets I listed above. I checked today and the links have not been added.

cmbz commented 4 months ago

@scolapasta will investigate several cases to see if reindexing helps, then will perform additional troubleshooting if needed.

jggautier commented 4 months ago

As of this writing, links for all but one of the datasets that I listed in this issue's first comment are in the two collections. I'm not sure how or when this happened. @scolapasta wrote in a Slack message that it might have happened during some reindexing.

The unpublished dataset doi:10.7910/DVN/QT7IXR (link) is the only dataset that doesn't have a link in the deankarlan collection and in the DFEEP collection.

It's the only unpublished dataset that we needed to create links for, as of this writing. And I'm not able to create links for unpublished datasets when I try on Demo Dataverse. When I use the API to try, I get a 403 error. As far as I can tell, the User Guides, such as https://guides.dataverse.org/en/6.2/user/dataverse-management.html#dataset-linking, don't mention that links can't be created for unpublished datasets.

stevenwinship commented 1 week ago

@jggautier doi:10.7910/DVN/QT7IXR (link) Is missing required Metadata fields (Text description and Subject). This is preventing the Dataset from being Published as well as Linked.

You are correct that it does not need to be published to be linked, but it does need to be valid.

stevenwinship commented 1 week ago

image

jggautier commented 1 week ago

Thanks for confirming @stevenwinship!

I actually wasn't sure if it needed to be published or not, and I'm not sure why I wasn't able to link an unpublished dataset when I tried on Demo Dataverse back in June. I'll try again today.

The person who opened the GitHub issue at https://github.com/IQSS/dataverse/issues/10134 mentions an error message when they try to link an unpublished dataset. But that issue was written back in Nov. 2023 and maybe it's been fixed since then.

Not being able to link an unpublished dataset that is missing required fields sounds like a bug, right? Or maybe an oversight? I don't think it was intended that people wouldn't be able to create links of these sorts of datasets, and it isn't mentioned in the latest version of the guides.

jggautier commented 1 week ago

When I try on Demo Dataverse to create a link of an unpublished dataset into another collection, I get the same error message reported in https://github.com/IQSS/dataverse/issues/10134:

Screenshot 2024-11-04 at 3 21 46 PM

That I'm given an error message like this makes me think that it's intended that unpublished datasets can't be linked. And if that's the case, it doesn't seem like a bug that we're not able to link an unpublished dataset that is missing required fields.

@stevenwinship, could you write about why a dataset doesn't need to be published to be linked? Are you seeing something in the code that indicates that we should be able to create links to unpublished datasets?

jggautier commented 1 week ago

Hey again. I thought I'd mention here that eventually I plan to propose that we improve that error message and what's in the API Guides.

That error message says something about harvested datasets. I've used that endpoint to help users create links of datasets that have been harvested, and I tested it again today in Harvard Dataverse to make sure it's still possible.

So at the least, that last part of the error message about harvested datasets should be removed.

But we also need to know whether or not we intended for the endpoint to let users create links to unpublished datasets. If it should be possible, I think we'd just want to remove the message since it shouldn't appear.