fasten-project / fasten

Analyse package dependency networks at the call graph level
https://www.fasten-project.eu
Apache License 2.0
90 stars 28 forks source link

GraphMavenResolver reports packages without associated revisions #121

Closed vigna closed 2 years ago

vigna commented 3 years ago

In the output of GraphMavenResolver.resolveFullDependencySet() for it.unimi.dsi / dsiutils / 2.2.2 (compile scope) we find dependencies like ${pom.groupId}:javax.servlet:1.0.0 containing what appear to be unresolved Maven variables. For what we could ascertain, these product have no associated revision in the database (i.e., they appear in the "packages" table but have no associated row in "package_versions"). They should not appear in the output, as the output is supposed to be formed by the revisions on which dsiutils-2.2.2 is dependent (and they probably should not appear in the database).

cos:cos:05Nov2002
org.jruby:jruby:1.0.1
javax.activation:activation:1.1
jline:jline:1.0
poi:poi:2.5.1
javax.xml:jsr173:1.0
commons-net:commons-net:2.2
jetty:org.mortbay.jetty:5.1.4
org.apache.tiles:tiles-jsp:2.0.6
javax.ejb:ejb:3.0
org.codehaus.cargo:cargo-core-container-jo:0.8
cglib:cglib:2.1_3
dom4j:dom4j:1.6.1
backport-util-concurrent:backport-util-concurrent:3.0
hibernate:hibernate:2.1.8
org.aspectj:aspectjweaver:1.6.1
org.apache.xmlgraphics:batik-dom:1.7
axion:axion:1.0-M3-dev
msv:relaxngDatatype:20030807
mondrian:mondrian:2.3.2.8944
commons-lang:commons-lang:2.6
javax.servlet:jstl:1.1.0
bsh:bsh:1.2b3
org.apache.lucene:lucene-core:2.0.0
commons-io:commons-io:2.4
com.ibm.icu:icu4j:2.6.1
activation:activation:1.0.2
org.codehaus.cargo:cargo-core-container-jetty:0.8
jfree:jfreechart:[1.0.0,)
org.apache.tiles:tiles-api:2.0.6
bsf:bsf:2.4.0
velocity-tools:velocity-tools-view:1.4
werken-xpath:werken-xpath:0.9.4
avalon-framework:avalon-framework:4.1.3
javax.jts:jts:1.0
org.slf4j:slf4j-jdk14:1.7.6
org.apache.xmlgraphics:batik-svggen:1.7
com.thoughtworks.xstream:xstream:1.4.2
ch.qos.logback:logback-classic:1.1.2
qdox:qdox:1.5
org.jboss.logging:jboss-logging-spi:2.1.2.GA
org.jboss.netty:netty:3.2.7.Final
org.springframework:spring-web:2.0.2
dbunit:dbunit:2.1
checkstyle:checkstyle-optional:4.3
com.ibatis:ibatis2:2.3.0.677
org.apache.xmlgraphics:batik-gvt:1.7
${pom.groupId}:javax.servlet:1.0.0
emma:emma_ant:2.1.5320
commons-collections:commons-collections-testframework:3.2.1
xdoclet:xjavadoc:1.1
junit-addons:junit-addons:1.4
ehcache:ehcache:1.2
org.codehaus.jettison:jettison:1.2
xmlpull:xmlpull:1.1.3.1
howl:howl-logger:0.1.11
msv:xsdlib:20030807
oro:oro:2.0.8
org.springframework:spring-webmvc:2.0.2
castor:castor:0.9.9.0-pre
commons-cli:commons-cli:1.0
postgresql:postgresql:8.4-701.jdbc4
asm:asm-attrs:2.2
org.apache.xmlgraphics:batik-script:1.7
com.caucho:hessian:3.1.3
radeox:radeox:0.9
org.slf4j:slf4j-simple:1.3.1
xerces:xercesImpl:2.4.0
org.jdom:jaxen-core:1.0-FCS
org.springframework:spring:2.5.6
stax:stax-api:1.0
org.aspectj:aspectjrt:1.6.1
org.jdom:saxpath:1.0-FCS
cglib:cglib-full:2.0.2
org.apache.geronimo.specs:geronimo-j2ee-connector_1.5_spec:1.0
aopalliance:aopalliance:1.0
jmock:jmock-cglib:1.2.0
org.apache.xmlgraphics:batik-awt-util:1.7
org.apache.mina:mina-filter-ssl:1.1.7
javax.mail:mail:1.4
net.sf.kxml:kxml2-min:2.3.0
xerces:xmlParserAPIs:2.6.2
org.objectweb.carol:carol:2.0.5
commons-beanutils:commons-beanutils-core:1.7.0
jboss:jboss-minimal:4.0.2
jstl:jstl:1.0.6
javax.transaction:jta:1.1
org.mockito:mockito-all:1.8.2
javax.xml:jaxrpc-api:1.1
jexcelapi:jxl:2.6.6
mx4j:mx4j:3.0.2
${pom.groupId}:org.osgi.core:1.4.0
taglibs:standard:1.1.2
poi:poi-2.0-final:20040126
org.slf4j:nlog4j:1.2.24
org.springframework:spring-aspects:2.0.2
commons-pool:commons-pool:1.4
regexp:regexp:1.3
org.codehaus.cargo:cargo-core-container-orion:0.8
org.apache.felix:org.osgi.core:1.4.0
org.apache.ant:ant-testutil:1.7.0
jboss:jboss-system:4.0.2
com.tonicsystems:jarjar:0.6
org.apache.geronimo.specs:geronimo-j2ee-deployment_1.1_spec:1.0
org.hibernate:hibernate-entitymanager:3.3.2.GA
org.springframework:spring-core:2.0.2
commons-beanutils:commons-beanutils:1.8.3
slide:webdavlib:2.0
javacc:javacc:3.2
org.apache.poi:poi:3.0.1-FINAL
org.fusesource.jansi:jansi:1.6
com.sun.jmx:jmxri:1.2.1
jdbm:jdbm:1.0
org.apache.felix:org.apache.felix.main:2.0.2
javax.resource:connector:1.0
cglib:cglib-nodep:2.2
jboss:jboss-j2se:200504122039
emma:emma:2.1.5320
org.testng:testng:6.5.2
org.bouncycastle:bcprov-jdk14:1.45
org.codehaus.janino:commons-compiler:2.6.1
org.apache.jackrabbit:jackrabbit-jcr-commons:1.5.2
javax.sql:jdbc-stdext:2.0
org.codehaus.cargo:cargo-core-api-util:0.8
commons-attributes:commons-attributes-api:2.2
opensymphony:oscache:2.1
org.apache.ant:ant-nodeps:1.7.1
org.apache.commons:commons-compress:1.1
javax.security:jacc:1.0
antlr:antlr:2.7.2
rhino:js:1.6R2
commons-jxpath:commons-jxpath:1.3
${pom.groupId}:org.apache.felix.shell:1.4.1
hibernate:antlr:2.7.5H3
org.springframework:spring-dao:2.0.2
aspectj:aspectjweaver:1.5.3
htmlunit:htmlunit:1.8
com.martiansoftware:jsap:2.1
com.icegreen:greenmail:1.3
tomcat:naming-java:5.0.28
httpunit:httpunit:1.6.1
org.codehaus.jcsp:jcsp:1.1-rc5
org.springframework:spring-aop:2.0.2
javax.xml.soap:saaj-api:1.3
bouncycastle:bcmail-jdk14:138
xom:xom:1.0b3
org.apache.felix:org.osgi.compendium:1.4.0
servletapi:servletapi:2.3
org.freemarker:freemarker:2.3.14
org.apache.xmlgraphics:batik-anim:1.7
groovy:groovy:1.0
org.codehaus.groovy:groovy:1.5.6
proxool:proxool:0.8.3
commons-primitives:commons-primitives:1.0
jboss:javassist:3.3.ga
com.caucho:burlap:2.1.12
ant:ant:1.5.1
org.codehaus.cargo:cargo-core-api-module:0.8
org.subethamail:subethasmtp:2.1.0
xpp3:xpp3:1.1.3.3
radeox:radeox-oro:0.9
opensymphony:quartz-all:1.6.0
org.apache.commons:commons-vfs2:2.0
commons-validator:commons-validator:1.3.1
org.springframework:spring-beans:2.0.2
org.codehaus.cargo:cargo-core-container-geronimo:0.8
jaxen:jaxen:1.1-beta-6
org.apache.bsf:bsf-api:3.1
org.codehaus.jsr166-mirror:jsr166y:1.7.0
commons-collections:commons-collections:20040616
commons-jelly:commons-jelly-tags-xml:1.0
commons-jelly:commons-jelly-tags-log:1.0
com.google.inject:guice:2.0
net.sourceforge.jexcelapi:jxl:2.6
org.slf4j:integration:1.7.6
org.codehaus.cargo:cargo-ant:0.8
com.mockrunner:mockrunner-jdk1.3-j2ee1.3:0.4
javax.resource:connector-api:1.5
xalan:xalan:2.5.1
org.codehaus.gpars:gpars:1.0.0
${pom.groupId}:org.apache.felix.shell.tui:1.4.1
org.codehaus.cargo:cargo-core-uberjar:0.8
jtidy:jtidy:4aug2000r7-dev
velocity:velocity-dep:1.4
org.jdom:jaxen-jdom:1.0-FCS
org.apache.mina:mina-integration-jmx:1.1.7
com.servlets:cos:05Nov2002
commons-configuration:commons-configuration:1.8
commons-modeler:commons-modeler:2.0
org.codehaus.cargo:cargo-core-api-generic:0.8
com.google.guava:guava:18.0
log4j:log4j:1.2.17
junitperf:junitperf:1.8
xerces:xerces-impl:2.6.2
org.apache.shale:shale-test:1.0.4
org.springframework:spring-jpa:2.0.2
org.apache.xmlgraphics:batik-bridge:1.7
geronimo-spec:geronimo-spec-jta:1.0.1B-rc2
com.jcraft:jsch:0.1.42
jasperreports:jasperreports:2.0.5
org.codehaus.cargo:cargo-core-container-tomcat:0.8
${pom.groupId}:org.apache.felix.bundlerepository:1.4.2
ch.qos.logback:logback-core:1.1.2
logkit:logkit:1.0.1
commons-jelly:commons-jelly-tags-junit:1.0
com.oracle:oc4j:1.0
stax:stax-ri:1.0
jotm:jotm_iiop_stubs:2.0.10
xmlunit:xmlunit:1.1
net.sf.kxml:kxml2:2.3.0
org.slf4j:jul-to-slf4j:1.7.6
org.apache.ibatis:ibatis-sqlmap:2.3.4.726
org.apache.geronimo.specs:geronimo-ejb_2.1_spec:1.0
org.hamcrest:hamcrest-core:1.3
org.apache.ant:ant-antlr:1.8.4
com.cenqua.clover:clover:1.3.13
org.apache.jackrabbit:jackrabbit-webdav:1.5.2
commons-attributes:commons-attributes-compiler:2.2
org.springframework:spring-remoting:1.2.8
javax.jms:jms:1.1
com.sun.jdmk:jmxtools:1.2.1
jgroups:jgroups-all:2.4.1
org.apache.xmlgraphics:batik-css:1.7
asm:asm-commons:2.2.3
com.bea.xml:jsr173-ri:1.0
org.apache.felix:org.osgi.foundation:1.2.0
c3p0:c3p0:0.9.1.2
jboss:jboss-archive-browsing:5.0.0alpha-200607201-119
woodstox:wstx-asl:3.2.2
org.apache.directory.server:apacheds-core:1.0-RC3
com.h2database:h2:1.2.132
jdom:jdom:1.0
com.lowagie:itext:2.0.7
tomcat:catalina:5.5.23
jmock:jmock:1.0.0
com.bea.wlplatform:commonj-twm:1.1
org.acegisecurity:acegi-security:1.0.3
stax:stax:1.2.0
org.apache.xmlgraphics:batik-xml:1.7
org.codehaus.cargo:cargo-core-container-jboss:0.8
com.google.code.findbugs:jsr305:1.3.9
groovy:groovy-all-minimal:1.0
org.apache.geronimo.specs:geronimo-servlet_2.4_spec:1.0
cactus:cactus:12-1.4.1
org.beanshell:bsh:2.0b4
org.codehaus.cargo:cargo-core-container-weblogic:0.8
joda-time:joda-time:1.6
org.hibernate:ejb3-persistence:1.0.1.GA
org.springframework:spring-context:2.0.2
commons-javaflow:commons-javaflow:20060411
org.codehaus.cargo:cargo-core-container-resin:0.8
struts:struts:1.2.9
org.slf4j:slf4j-ext:1.7.6
mockobjects:mockobjects-core:0.09
aspectj:aspectjrt:1.5.3
junit:junit:4.12-beta-1
xpp3:xpp3_min:1.1.4c
nekohtml:nekohtml:0.9.5
pull-parser:pull-parser:2
org.slf4j:slf4j-log4j12:1.7.6
com.thoughtworks.qdox:qdox:1.12
com.google.protobuf:protobuf-java:2.4.1
org.springframework:spring-support:1.2.8
org.mockito:mockito-core:1.9.0
org.apache.commons:commons-math3:3.3
com.oracle:toplink-essentials:2.41
org.codehaus.groovy:groovy-all:2.0.7
org.easymock:easymock:3.1
org.objenesis:objenesis:1.0
quartz:quartz:1.6.0
org.codehaus.cargo:cargo-core-api-container:0.8
xml-apis:xml-apis-ext:1.3.04
org.springframework:spring-jdbc:2.0.2
net.sf.hibernate:hibernate:2.1.8
saxpath:saxpath:1.0-FCS
javax.servlet:servlet-api:2.3
xml-resolver:xml-resolver:1.2
org.springframework:spring-agent:2.0.2
javax.jcr:jcr:1.0
commons-codec:commons-codec:1.5
org.codehaus.woodstox:wstx-asl:3.2.7
mockobjects:mockobjects-jdk1.4-j2ee1.3:0.09
org.apache.xmlgraphics:batik-util:1.7
javassist:javassist:3.4.GA
com.experlog:xapool:1.5.0
org.easymock:easymockclassextension:3.1
idb:idb:3.26
org.springframework:spring-mock:2.0.2
forehead:forehead:1.0-beta-5
commons-digester:commons-digester:1.8.1
xerces:xerces:2.4.0
com.oracle.toplink:toplink:10.1.3
org.apache.ant:ant-junit:1.8.4
javax.faces:jsf-api:1.1
org.jdom:jdom:1.1
jboss:jboss-jee:4.2.0.GA
xml-apis:xml-apis:1.0.b2
org.apache.geronimo.specs:geronimo-jms_1.1_spec:1.0
commons-dbcp:commons-dbcp:1.2.2
velocity:velocity:1.5
gsbase:gsbase:2.0.1
commons-jelly:commons-jelly:1.0
org.apache.commons:commons-jexl:2.1.1
openejb:openejb-loader:1.0
itext:itext:1.3
ch.qos.cal10n:cal10n-api:0.8.1
org.yaml:snakeyaml:1.6
asm:asm-tree:2.2.3
commons-logging:commons-logging:1.1.1
bouncycastle:bcprov-jdk14:138
org.apache.mina:mina-core:
jruby:jruby:0.9.2
commons-fileupload:commons-fileupload:1.2
jotm:jotm_jrmp_stubs:2.0.10
org.apache.xmlgraphics:batik-parser:1.7
org.apache.velocity:velocity:1.6.2
org.slf4j:log4j-over-slf4j:1.7.6
org.apache.openejb:javaee-api:5.0-2
org.ccil.cowan.tagsoup:tagsoup:0.9.7
swarmcache:swarmcache:1.0RC2
jaxme:jaxme-api:0.3
com.jamonapi:jamon:2.4
org.hibernate:hibernate-annotations:3.3.1.GA
${pom.groupId}:org.apache.felix.framework:2.0.2
jboss:jboss-cache:1.2.2
org.hibernate:hibernate:3.2.6.ga
org.json:json:20080701
org.apache.ant:ant-launcher:1.7.1
xstream:xstream:1.2
ant:ant-junit:1.6.5
org.bouncycastle:bcpg-jdk14:1.45
javax.portlet:portlet-api:1.0
it.unimi.dsi:fastutil:6.5.15
eclipse:jdtcore:[3.1.0,)
jfree:jcommon:[1.0.0,)
ch.qos.logback:logback-classic.jar:2.2.2
org.apache.geronimo.specs:geronimo-jta_1.0.1B_spec:1.0
ant:ant-nodeps:1.6.2
portlet-api:portlet-api:1.0
groovy:groovy-all:1.0-beta-10
openejb:openejb-core:1.0
org.mockejb:mockejb:0.6-beta2
commons-vfs:commons-vfs:1.0
mysql:mysql-connector-java:5.1.9
net.sf.ehcache:ehcache:1.5.0
javax.servlet:jsp-api:2.0
asm:asm-analysis:2.2
ant:ant-trax:1.6.2
com.keypoint:png-encoder:1.5
checkstyle:checkstyle:4.3
org.apache.ant:ant:1.7.1
javax.persistence:persistence-api:1.0
org.codehaus.janino:janino:2.6.1
commons-httpclient:commons-httpclient:2.0.2
mx4j:mx4j-jmx:2.1.1
javax.jdo:jdo2-api:2.0
com.beust:jcommander:1.12
commons-logging:commons-logging-api:1.1
org.apache.ivy:ivy:2.2.0
net.sf.jsr107cache:jsr107cache:1.0
commons-beanutils:commons-beanutils-bean-collections:1.7.0
org.slf4j:slf4j-api:1.7.7
commons-discovery:commons-discovery:20030211.213356
asm:asm-util:2.2.3
ant:ant-launcher:1.6.2
jotm:jotm:2.0.10
freemarker:freemarker:2.3.8
tomcat:naming-common:5.0.28
com.megginson.sax:xml-writer:0.2
org.hibernate:hibernate-commons-annotations:3.0.0.ga
org.multiverse:multiverse-beta:0.7-RC-1
com.jcraft:jzlib:1.0.7
org.apache.xmlgraphics:batik-ext:1.7
org.apache.directory.server:apacheds-core-shared:1.0-RC3
org.apache.tiles:tiles-core:2.0.6
concurrent:concurrent:1.3.4
asm:asm:2.2.3
org.apache.axis:axis:1.4
cglib-nodep:cglib-nodep:2.1_3
javax.j2ee:j2ee:1.4
org.apache.xmlgraphics:batik-svg-dom:1.7
hessian:hessian:3.0.20
jboss:jboss-common:4.0.2
velocity-tools:velocity-tools-generic:1.4
cactus:cactus-ant:1.4.1
org.apache.xmlgraphics:batik-js:1.7
hsqldb:hsqldb:1.8.0.7
commons-jexl:commons-jexl:1.0
cas:casclient:2.0.11
MihhailSokolov commented 3 years ago

Things like ${...} are in the database because they were in our initial data set. Looks like Maven Crawler did not use some kind of POM property reference resolution in order to resolve these properties to their actual values. This issue can be resolved by fixing the Maven Crawler and re-running it through the Maven. @mir-am I would appreciate if you could fix that. The fix should be quite simple because I already implemented this property reference resolution in POMAnalyzer in DataExtractor.replacePropertyReferences(...) on the develop branch. I will also try to make sure that the retrieved data is consistent and there are no packages without associated versions.

mir-am commented 3 years ago

Things like ${...} are in the database because they were in our initial data set. Looks like Maven Crawler did not use some kind of POM property reference resolution in order to resolve these properties to their actual values. This issue can be resolved by fixing the Maven Crawler and re-running it through the Maven. @mir-am I would appreciate if you could fix that. The fix should be quite simple because I already implemented this property reference resolution in POMAnalyzer in DataExtractor.replacePropertyReferences(...) on the develop branch. I will also try to make sure that the retrieved data is consistent and there are no packages without associated versions.

@MihhailSokolov I can add this feature to the Maven crawler, i.e., resolving property references. However, re-running the crawler is extremely expensive. It takes months to gather hundreds of thousands of Maven packages. That said, I think that you can still use the POM analyzer on the crawler's output topic for resolving the described cases.

MihhailSokolov commented 3 years ago

Things like ${...} are in the database because they were in our initial data set. Looks like Maven Crawler did not use some kind of POM property reference resolution in order to resolve these properties to their actual values. This issue can be resolved by fixing the Maven Crawler and re-running it through the Maven. @mir-am I would appreciate if you could fix that. The fix should be quite simple because I already implemented this property reference resolution in POMAnalyzer in DataExtractor.replacePropertyReferences(...) on the develop branch. I will also try to make sure that the retrieved data is consistent and there are no packages without associated versions.

@MihhailSokolov I can add this feature to the Maven crawler, i.e., resolving property references. However, re-running the crawler is extremely expensive. It takes months to gather hundreds of thousands of Maven packages. That said, I think that you can still use the POM analyzer on the crawler's output topic for resolving the described cases.

No, POMAnalyzer cannot resolve these property references. In order to find the value of the property i.e. resolve the reference, it needs to know the coordinate to download its POM file, and it is impossible to do so if coordinate is ${pom.groupId}:javax.servlet:1.0.0. The only way to fix this is to fix MavenCrawler and re-run it. If it is so expensive, then I guess it is better to discuss it with @gousiosg and @proksch too

mir-am commented 3 years ago

Things like ${...} are in the database because they were in our initial data set. Looks like Maven Crawler did not use some kind of POM property reference resolution in order to resolve these properties to their actual values. This issue can be resolved by fixing the Maven Crawler and re-running it through the Maven. @mir-am I would appreciate if you could fix that. The fix should be quite simple because I already implemented this property reference resolution in POMAnalyzer in DataExtractor.replacePropertyReferences(...) on the develop branch. I will also try to make sure that the retrieved data is consistent and there are no packages without associated versions.

@MihhailSokolov I can add this feature to the Maven crawler, i.e., resolving property references. However, re-running the crawler is extremely expensive. It takes months to gather hundreds of thousands of Maven packages. That said, I think that you can still use the POM analyzer on the crawler's output topic for resolving the described cases.

No, POMAnalyzer cannot resolve these property references. In order to find the value of the property i.e. resolve the reference, it needs to know the coordinate to download its POM file, and it is impossible to do so if coordinate is ${pom.groupId}:javax.servlet:1.0.0. The only way to fix this is to fix MavenCrawler and re-run it. If it is so expensive, then I guess it is better to discuss it with @gousiosg and @proksch too

I see! Okay, I can fix this in the crawler but the good news is that the POM URL is included in the record for such cases, so you can download it, and possibly resolve the property reference.

MihhailSokolov commented 3 years ago

Things like ${...} are in the database because they were in our initial data set. Looks like Maven Crawler did not use some kind of POM property reference resolution in order to resolve these properties to their actual values. This issue can be resolved by fixing the Maven Crawler and re-running it through the Maven. @mir-am I would appreciate if you could fix that. The fix should be quite simple because I already implemented this property reference resolution in POMAnalyzer in DataExtractor.replacePropertyReferences(...) on the develop branch. I will also try to make sure that the retrieved data is consistent and there are no packages without associated versions.

@MihhailSokolov I can add this feature to the Maven crawler, i.e., resolving property references. However, re-running the crawler is extremely expensive. It takes months to gather hundreds of thousands of Maven packages. That said, I think that you can still use the POM analyzer on the crawler's output topic for resolving the described cases.

No, POMAnalyzer cannot resolve these property references. In order to find the value of the property i.e. resolve the reference, it needs to know the coordinate to download its POM file, and it is impossible to do so if coordinate is ${pom.groupId}:javax.servlet:1.0.0. The only way to fix this is to fix MavenCrawler and re-run it. If it is so expensive, then I guess it is better to discuss it with @gousiosg and @proksch too

I see! Okay, I can fix this in the crawler but the good news is that the POM URL is included in the record for such cases, so you can download it, and possibly resolve the property reference.

Ah, I forgot about the POM URL. Then when we have time later, we can create a small script that would go through the records produced by MavenCrawler and produce these records to a new topic fixing the unresolved references. And we don't need to re-run the MavenCrawler from the beginning. I will add an issue.

MihhailSokolov commented 3 years ago

The required functionality has been implemented in 447e419. As soon as we clean the database and restart the writing to it, this issue will be resolved.

mir-am commented 2 years ago

This should be fixed now with the new improvements to the POMAnalyzer.