Closed beepsoft closed 1 year ago
Hmm this is a difficult problem and I'm not sure there's a good solution that doesn't have side effects.
Creating a random ID is easy and can be done easily. But that's not the problem.
Say a user of the app creates a person entity: Jane Doe. Describo internally creates the following JSON-LD snippet:
{
"@id": "#Jane%20Doe",
"@type": "Person",
name: "Jane Doe"
}
At this point, the user should edit the entity and assign (preferably) an ORCID to this entity to uniquely identify them (so says the RO Crate spec). However, it's highly likely that the user won't do that. So in that case, having the @id
property as "#Jane%20Doe" makes the entity more easily recognisable to the user than if the @id
were some autogenerated, unique id: #sdfkhsufw4r34tnk
.
Of course this means that an entity of another type with the same id will result in describo linking the wrong entity rather than creating a new one (expected behaviour; not a bug). But the question is how likely is this to happen (I would argue not very) and if it does, does it justify losing the recognisability of an @id
by using auto generated random ids (I don't think so)?
Your example is purely a testing artefact. It's not to say that it can't happen; but I don't think it's very likely is all. And I don't think the tradeoff described above justifies changing the current behaviour.
To your point of generating a random id if there is a clash: easily doable but again, I don't think it's a valid tradeoff given my comments above.
When creating a new entity, users must enter its name, which will be displayed to them instead of the ID. For example, if a user creates a Person entity with the following JSON object:
{
"@id": "#sdfkhsufw4r34tnk",
"@type": "Person",
name: "Jane Doe"
}
the user will see "Jane Doe" as the linked author, rather than the @id
. Additionally, in the "Browse entities" window, the entity will be identified by its name.
Or am I missing something here? I which view would an auto generated ID cause confusions for the user?
No, you're not missing anything. Inside describo they see the name.
I was referring to the underlying data and how that's used or handled outside of describo. I think auto generating id's in the RO crate world is not the done thing (when I first developed describo that's exactly what I did) and we shouldn't be relying on that.
Like I said at the start of my message " I'm not sure there's a good solution that doesn't have side effects.".
A reason for not generating random id's is that it forces the user to actually consider setting the @id
property on an entity to something sensible and relevant. If every entity gets a random id then they never need to consider doing this meaning they would be creating new 'versions' of each entity which is also not desirable.
I understand your point. Still, wouldn't it be possible to let the application using the crate builder to decide about the ID format? For example, it may automatically lookup or suggest the ORCID for the Person just created and use that as the @id
.
May I try to implement this and submit a PR?
Looking up ORCID's is not straightforward and it's hard to disambiguate the results automatically. But it's not just an issue for people. It's an issue for any type of entity.
Feel free to have a look but the problem is not that we can't generate id's. The issue is what it means and what the flow on effects are.
Yes, assigning automatically ORCID or else is hardly automatable, that's why I thought of a suggestion mechanism that the application may provide (we have this in other applications dealing with authors and authorships).
I think it is generally a good idea for a component like the crate builder to provide sensible defaults like the current id generation but allow extensions or custom implementations when required. What do you think?
I also feel that some more callbacks/hooks would help make the component more adjustable for custom use cases. For example, a callback before actually adding a context entity could make it possible to set some default values (similar to the id), for example creation date, or name/link of the creator, or anything else the application may provide from some other sources.
Yes, assigning automatically ORCID or else is hardly automatable, that's why I thought of a suggestion mechanism that the application may provide (we have this in other applications dealing with authors and authorships).
I think it is generally a good idea for a component like the crate builder to provide sensible defaults like the current id generation but allow extensions or custom implementations when required. What do you think?
I feel like this has veered off course. The component can easily be adapted to generate unique id's; but as explained above, I think it's a solution that potentially causes other issues. At the moment you've seen an issue in development with a typical developer testing flow. That issue has not been seen in production and I don't believe that it will be.
I also feel that some more callbacks/hooks would help make the component more adjustable for custom use cases. For example, a callback before actually adding a context entity could make it possible to set some default values (similar to the id), for example creation date, or name/link of the creator, or anything else the application may provide from some other sources.
I understand why you want to do this but I would like you to consider the complexity this kind of idea adds.
Please check this. After some time thinking about it I couldn't come up with a good reason to not implement random id generation if id matches an existing entity but the type is different. I still think this could lead to issues with users not setting sensible id's in place of the random strings but we can always revisit if it introduces unintended consequences.
This should no longer be an issue so closing this ticket. Reopen or create a new one if you see it again.
I tried to add two different entities: a Person and an Education level. When creating I named both "aaa". With this I ended up with two entities with the same "#aaa", or more precisely a Person entity with "#aaa" but both the Person and the Education level references this Person object, which is wrong. Here's a little video about it:
Wouldn't it be better to generate ID-s (eg. uuid) randomly and not based on the Name of the entity?
Or optionally a user provided callback function, which can provide the ID for a new entity. It may then decide to have an ID generated based on the name of there's no clash, or in case of clashing generating a randomized ID.