[ ] What happens if [the code has] no licence? Does it mean that SWH will not archive it?
Answer: It is archived but can't be reused and can be removed.
[ ] Does Step 9: Metadata deposit cover the case where an SW doesn't have a repo, but it exists? Like, for example, for closed source code?
Answer: Depositing metadata in SWH is only allowed on an existing URL or SWHID, so no at the moment.
[ ] Which SWHID is to be requested? Is there anything "higher" in abstraction than snapshot?
Answer: We usually suggest the dir SWHID with all parameters, as it is the most stable SWHID. It might be different approach with metadata records. When you associate with a "higher" identifier which is a concept identifier, it is not persistent, not stable and is not a PID. This is less recommended. Even a snapshot is less recommended than a directory-with-context SWHID.
[ ] Is code saved temporally in SWH? For example, I might have written a paper 3 months ago using version 6.1 of SW X... Today I receive a validation request that I approve, but SW X is now at version 6.3. Will my record be associated with version 6.3?
Answer: The save code now request pulls the complete dev history. From this full snapshot, you can find 6.3 in the releases.
[ ] A question for Softcite is, how useful would it be to store this feedback in the future? Do we need to use the validation only for display/not display the link between paper and SW, or we are also looking to build an automatically generated dataset that can then be used to train a new version of Softcite? A decision on this will define the quantity and complexity of the data we track.
Answer (from @kermitt2) : I think it is indeed important to log all these manual validations to reuse them for training data. From the point of view of training, false positives and false negative are the most useful. Here we can capture false negatives, which is already helpful. However, to clarify too: the validation will be at document level (as mentioned otherwise too heavy for a normal user), so this will be only useful for the disambiguation model training and evaluation (not for software mention recognition).
[ ] Does SWH link to a specific piece of code / commit within the repo, or does it link to the repository itself?
Answer: A SWHID is a hash, any change in content modifies the SWHID. There are 5 types of SWHIDs: see https://www.swhid.org/specification/v1.1/4.Syntax/.
[ ] How is the match to identify existing software assets managed? There are various levels of disambiguation we can do, what is the support from SW Heritage API about this?
Answer: SWH can be queried for URLs or parts in URL. It can also resolve SWHIDs (core-swhid or complete swhid with context).
List of potential questions:
dir
SWHID with all parameters, as it is the most stable SWHID. It might be different approach with metadata records. When you associate with a "higher" identifier which is a concept identifier, it is not persistent, not stable and is not a PID. This is less recommended. Even a snapshot is less recommended than a directory-with-context SWHID.