Closed vigna closed 2 years ago
Things like ${...}
are in the database because they were in our initial data set. Looks like Maven Crawler did not use some kind of POM property reference resolution in order to resolve these properties to their actual values.
This issue can be resolved by fixing the Maven Crawler and re-running it through the Maven.
@mir-am I would appreciate if you could fix that. The fix should be quite simple because I already implemented this property reference resolution in POMAnalyzer in DataExtractor.replacePropertyReferences(...)
on the develop
branch.
I will also try to make sure that the retrieved data is consistent and there are no packages without associated versions.
Things like
${...}
are in the database because they were in our initial data set. Looks like Maven Crawler did not use some kind of POM property reference resolution in order to resolve these properties to their actual values. This issue can be resolved by fixing the Maven Crawler and re-running it through the Maven. @mir-am I would appreciate if you could fix that. The fix should be quite simple because I already implemented this property reference resolution in POMAnalyzer inDataExtractor.replacePropertyReferences(...)
on thedevelop
branch. I will also try to make sure that the retrieved data is consistent and there are no packages without associated versions.
@MihhailSokolov I can add this feature to the Maven crawler, i.e., resolving property references. However, re-running the crawler is extremely expensive. It takes months to gather hundreds of thousands of Maven packages. That said, I think that you can still use the POM analyzer on the crawler's output topic for resolving the described cases.
Things like
${...}
are in the database because they were in our initial data set. Looks like Maven Crawler did not use some kind of POM property reference resolution in order to resolve these properties to their actual values. This issue can be resolved by fixing the Maven Crawler and re-running it through the Maven. @mir-am I would appreciate if you could fix that. The fix should be quite simple because I already implemented this property reference resolution in POMAnalyzer inDataExtractor.replacePropertyReferences(...)
on thedevelop
branch. I will also try to make sure that the retrieved data is consistent and there are no packages without associated versions.@MihhailSokolov I can add this feature to the Maven crawler, i.e., resolving property references. However, re-running the crawler is extremely expensive. It takes months to gather hundreds of thousands of Maven packages. That said, I think that you can still use the POM analyzer on the crawler's output topic for resolving the described cases.
No, POMAnalyzer cannot resolve these property references. In order to find the value of the property i.e. resolve the reference, it needs to know the coordinate to download its POM file, and it is impossible to do so if coordinate is ${pom.groupId}:javax.servlet:1.0.0
. The only way to fix this is to fix MavenCrawler and re-run it. If it is so expensive, then I guess it is better to discuss it with @gousiosg and @proksch too
Things like
${...}
are in the database because they were in our initial data set. Looks like Maven Crawler did not use some kind of POM property reference resolution in order to resolve these properties to their actual values. This issue can be resolved by fixing the Maven Crawler and re-running it through the Maven. @mir-am I would appreciate if you could fix that. The fix should be quite simple because I already implemented this property reference resolution in POMAnalyzer inDataExtractor.replacePropertyReferences(...)
on thedevelop
branch. I will also try to make sure that the retrieved data is consistent and there are no packages without associated versions.@MihhailSokolov I can add this feature to the Maven crawler, i.e., resolving property references. However, re-running the crawler is extremely expensive. It takes months to gather hundreds of thousands of Maven packages. That said, I think that you can still use the POM analyzer on the crawler's output topic for resolving the described cases.
No, POMAnalyzer cannot resolve these property references. In order to find the value of the property i.e. resolve the reference, it needs to know the coordinate to download its POM file, and it is impossible to do so if coordinate is
${pom.groupId}:javax.servlet:1.0.0
. The only way to fix this is to fix MavenCrawler and re-run it. If it is so expensive, then I guess it is better to discuss it with @gousiosg and @proksch too
I see! Okay, I can fix this in the crawler but the good news is that the POM URL is included in the record for such cases, so you can download it, and possibly resolve the property reference.
Things like
${...}
are in the database because they were in our initial data set. Looks like Maven Crawler did not use some kind of POM property reference resolution in order to resolve these properties to their actual values. This issue can be resolved by fixing the Maven Crawler and re-running it through the Maven. @mir-am I would appreciate if you could fix that. The fix should be quite simple because I already implemented this property reference resolution in POMAnalyzer inDataExtractor.replacePropertyReferences(...)
on thedevelop
branch. I will also try to make sure that the retrieved data is consistent and there are no packages without associated versions.@MihhailSokolov I can add this feature to the Maven crawler, i.e., resolving property references. However, re-running the crawler is extremely expensive. It takes months to gather hundreds of thousands of Maven packages. That said, I think that you can still use the POM analyzer on the crawler's output topic for resolving the described cases.
No, POMAnalyzer cannot resolve these property references. In order to find the value of the property i.e. resolve the reference, it needs to know the coordinate to download its POM file, and it is impossible to do so if coordinate is
${pom.groupId}:javax.servlet:1.0.0
. The only way to fix this is to fix MavenCrawler and re-run it. If it is so expensive, then I guess it is better to discuss it with @gousiosg and @proksch tooI see! Okay, I can fix this in the crawler but the good news is that the POM URL is included in the record for such cases, so you can download it, and possibly resolve the property reference.
Ah, I forgot about the POM URL. Then when we have time later, we can create a small script that would go through the records produced by MavenCrawler and produce these records to a new topic fixing the unresolved references. And we don't need to re-run the MavenCrawler from the beginning. I will add an issue.
The required functionality has been implemented in 447e419. As soon as we clean the database and restart the writing to it, this issue will be resolved.
This should be fixed now with the new improvements to the POMAnalyzer.
In the output of GraphMavenResolver.resolveFullDependencySet() for it.unimi.dsi / dsiutils / 2.2.2 (compile scope) we find dependencies like ${pom.groupId}:javax.servlet:1.0.0 containing what appear to be unresolved Maven variables. For what we could ascertain, these product have no associated revision in the database (i.e., they appear in the "packages" table but have no associated row in "package_versions"). They should not appear in the output, as the output is supposed to be formed by the revisions on which dsiutils-2.2.2 is dependent (and they probably should not appear in the database).