DataONEorg / rdataone

R package for reading and writing data at DataONE data repositories
http://doi.org/10.5063/F1M61H5X
36 stars 19 forks source link

xmlParseEntityRef failed somewhere in uploadDataPackage #277

Closed earnaud closed 2 years ago

earnaud commented 3 years ago

Upon executing the following code:

packageId <- try(
    dataone::uploadDataPackage(
        d1c,
        dp,
        public = TRUE,
        accessRules = accessRules, # default
        quiet = FALSE,
        packageId = paste0("urn:uuid:", uuid::UUIDgenerate())
    )
)

I get the following error:

Error : 1: xmlParseEntityRef: no name
2: xmlParseEntityRef: no name
3: StartTag: invalid element name
4: xmlParseEntityRef: no name
5: xmlParseEntityRef: no name
6: xmlParseEntityRef: no name
7: xmlParseEntityRef: no name
8: Opening and ending tag mismatch: script line 18 and body
9: Opening and ending tag mismatch: body line 18 and html
10: EndTag: '</' not found

I do not know where this error comes from, except this comes after the "Uploading a new package to member node [...]" message.

Some more information about my attempt:

Thanks by advance for any answer that will come !

amoeba commented 3 years ago

Hi @earnaud, thanks for the report. This looks similar to what @yvanlebras was talking about the other day.

I think what's going on here is that your MetacatUI installation is at https://data.test.pndb.fr but your Metacat installation is at https://test.pndb.fr/metacat/d1/mn/v2/node. And the former is what's been registered in your DataONE node document while the latter is what's needed for your above code to work.

You might try running this before running your above code to override the current registration info:

d1c@mn <- MNode("https://test.pndb.fr/metacat/d1/mn/v2/node")

What do you think?

amoeba commented 3 years ago

Hey @earnaud, I just touched base with @taojing2002 who helped onboard PNDB and he's thinking something might have changed on your end configuration-wise. He's going to reach out to your team via email. Feel free to keep chatting here or on the NCEAS Slack in #knb.

@gothub I know this a bit of an edge case but do you think the MNode constructor could use some more error handling? That would help catch the issue in @earnaud's code and mine a bit earlier (at D1Client or MNode instantiation).

earnaud commented 3 years ago

Hi, I just discover your -nice- answers. Thanks a lot for this help ! We will try out some solutions soon, and will tell you when this will be over.

earnaud commented 3 years ago

Well, things are going quite ... strange. I do not have error anymore, however I get a strange uuid for my metadata file:

urn:node:mnTestARCTICNSF Arctic Data Center Test RepositoryThe National Science Foundation's Arctic Data Center operates as the primary data repository supporting the NSF Arctic community for data preservation and data discovery.https://test.arcticdata.io/metacat/d1/mn2021-08-11T12:07:14.593+00:002021-08-11T12:07:14.593+00:00CN=urn:node:mnTestARCTIC,DC=dataone,DC=orgCN=Christopher Jones A2108,O=Google,C=US,DC=cilogon,DC=orgsuccess2.15.0

I bet some weird conf are at work. Anyway, the initial issue seems to be progressing: I get no frank error, but I can't find the data package in the targetted metacat. I think this is not relevant for this issue.

Thanks for your precious help @amoeba !

earnaud commented 3 years ago

These strange uuid are also present with the test arctic data repo, as soon "/node" is a suffix to the endpoint:

> mn@identifier
[1] "urn:node:mnTestARCTIC"
> mn@endpoint
[1] "https://test.arcticdata.io/metacat/d1/mn/v2/node"
> generateIdentifier(mn, "uuid")
[1] "urn:node:mnTestARCTICNSF Arctic Data Center Test RepositoryThe National Science Foundation's Arctic Data Center operates as the primary data repository supporting the NSF Arctic community for data preservation and data discovery.https://test.arcticdata.io/metacat/d1/mn2021-08-11T12:25:57.439+00:002021-08-11T12:25:57.439+00:00CN=urn:node:mnTestARCTIC,DC=dataone,DC=orgCN=Christopher Jones A2108,O=Google,C=US,DC=cilogon,DC=orgsuccess2.15.0"

Imo, this could be a bit of a help to be fixed in the error handling you were talking about previously.

mbjones commented 3 years ago

I think mn@endpoint is still incorrect. it should be https://test.arcticdata.io/metacat/d1/mn/v2. The endpoint can also change over time, so it is best practice to not hardcode it in code, and instead look it up using the immutable node identifier. Here's an example from our training tutorial that uses the node Identifier:

> library(dataone)
> d1c <- D1Client("STAGING", "urn:node:mnTestARCTIC")
> mn <- d1c@mn
> mn@endpoint
[1] "https://test.arcticdata.io/metacat/d1/mn/v2"

Note the difference between the endpoint there and the one you provided.

amoeba commented 3 years ago

Thanks @mbjones. @earnaud let us know if that works for you. The example directly above is the preferred way to use the package and is more robust.

earnaud commented 3 years ago

Thanks @mbjones and @amoeba . I am trying to build a data package upload module in a shiny app. I keep a table of two variables: cn and mn (sample row: "STAGING", "urn:node:mnTestARCTIC"). However, since @amoeba advised me to add "/node" for the PNDB instance, I was curious to get an uniform way to write all mn the same way.

Whatever, adding the "/node" at the end of a MNode identifier does not seem to solve my initial problem ;) Still investigating.

amoeba commented 3 years ago

However, since @amoeba advised me to add "/node" for the PNDB instance

Ah, that's my fault. Sorry for the mistake and leading you down that rabbit hole. I think the D1Client method should be help. Also check out listNodes which lists the nodes in an environment.

earnaud commented 3 years ago

Yup, I tried listNodes and got both PNDB and Arctic Data test MN listed. Indeed, I could make all the nodes listed this way accessible.