Closed johannesduesing closed 3 years ago
A partial answer:
processing
package than in the preprocessing
package. I fixed the two points you addressed in the latest commit. Now there is the issue of storing the data. Currently the ElasticStoreQueries
trait supports storing a MavenIdentifier
and a HermesResult
, however the publication date and metadata is only available in the MavenArtifact
class.
My plan would be to write an additional method that stores a MavenArtifact
by extracting its MavenIdentifier
and writing the publication date and metadata, if available, to the database (similar to what is being done for HermesResult
). I would then attach this as a sink to the "Processing" stage using the .alsoTo
operator, similar to the current implementation for storing MavenIdentifiers.
Do you agree with that plan? And if so, do you want me to implement the whole thing or make it a skeleton implementation until we dicussed the elastic data model changes in depth?
Here's the latest update to this PR:
${foo.version}
) they are attempted to be resolved. Resolving variables starts in the local POM, but downloads and processes parent-POMs if required and available. Same goes for dependencies without a version, the implementation will recurse through all parents to find the matching version definition. Also the scope of dependencies is being extracted.I tested the application on my machine using a fresh elasticsearch instance (version 5.6.9), and POM file processing seems to work fine. For me, the only thing left to discuss is a suitable data model for storing the data. Using the current implementation, a search query to ElasticSearch yields the following result:
[...]
"identifier" : {
"groupId" : "xom",
"artifactId" : "xom",
"version" : "1.2.5"
},
"discovered" : "2020-09-21T15:10:34.824+02:00",
"published" : "2010-05-12T06:22:10.000Z",
"pom" : {
"parent" : "None",
"licenses" : [
{
"name" : "The GNU Lesser General Public License, Version 2.1",
"url" : "http://www.gnu.org/licenses/lgpl-2.1.html"
}
],
"issueManagement" : "None",
"developers" : "elharo",
"name" : "XOM",
"description" : "The XOM Dual Streaming/Tree API for Processing XML",
"packaging" : "jar",
"dependencies" : [
{
"groupId" : "xml-apis",
"scope" : "default",
"artifactId" : "xml-apis",
"version" : "1.3.03"
},
{
"groupId" : "xerces",
"scope" : "default",
"artifactId" : "xercesImpl",
"version" : "2.8.0"
},
{
"groupId" : "xalan",
"scope" : "default",
"artifactId" : "xalan",
"version" : "2.7.0"
}
]
}
}
I am unsure whether or not this is the correct way to deal with lists (for dependencies and licenses) in ElasticSearch. @bhermann what is your opinion on that ?
Kudos, SonarCloud Quality Gate passed!
0 Bugs
0 Vulnerabilities (and 0 Security Hotspots to review)
0 Code Smells
No Coverage information
0.0% Duplication
Closed as this functionality is now part of the redesign proposed in #50
Reason for this PR According to #15, the Delphi crawler does not process any artifact information stored in the respective POM file yet. This means that potentially interesting data fields (including project name, description, etc..) are not accessible when querying Delphi. In addition to that, the publication date of an artifact is not processed either (see #37).
Changes in this PR
MavenArtifact
class with optional attributespublicationDate
andmetadata
of type ArtifactMetadataArtifactMetadata
that is supposed to hold information parsed from POM files, currentlyname
,description
and system name & URL of theissueManagement
MavenDownloadActor
and set accordinglyPomFileReadActor
. Reads POM file for a givenMavenArtifact
and sets theArtifactMetadata
accordingly. Currently triggered in theMavenDiscoveryProcess
as part of preprocessing. Uses Apache Xpp3Reader for POM file processing.Open for discussion
@bhermann , what's your opinion on these questions?