Commonjava / atlas

Project-Graphing API
Other
7 stars 14 forks source link

Problem with parsing Maven paths with tar.bz2 type/extension #96

Open grgrzybek opened 11 months ago

grgrzybek commented 11 months ago

with 4e0daa5cc6822161fdb272d2f57adab87f12e146 reverted, my test (in org.commonjava.atlas.maven.ident.util.ArtifactPathInfoTest) works:

@Test
public void matchNormalClassifier3()
{
    String path = "/io/syndesis/s2i/s2i/1.15.0.fuse-7_13_0-00001-redhat-00001/s2i-1.15.0.fuse-7_13_0-00001-redhat-00001-m2.tar.bz2";
    ArtifactPathInfo pathInfo = ArtifactPathInfo.parse( path );
    assertThat( pathInfo.getVersion(), equalTo( "1.15.0.fuse-7_13_0-00001-redhat-00001" ) );
    assertThat( pathInfo.getClassifier(), equalTo( "m2" ) );
    assertThat( pathInfo.getType(), equalTo( "tar.bz2" ) );
}

otherwise, the type is bz2 and the classifier is m2.tar

grgrzybek commented 11 months ago

What's worse, is that normal artifacts like this won't work as well:

@Test
public void matchNormalClassifier3()
{
    String path = "/commons-x/commons-x/1/commons-x-1-sources.jar.sha1";
    ArtifactPathInfo pathInfo = ArtifactPathInfo.parse( path );
    assertThat( pathInfo.getVersion(), equalTo( "1" ) );
    assertThat( pathInfo.getClassifier(), equalTo( "sources" ) );
    assertThat( pathInfo.getType(), equalTo( "jar.sha1" ) );
}
java.lang.AssertionError: 
Expected: "sources"
     but: was "sources.jar"
Expected :sources
Actual   :sources.jar
grgrzybek commented 11 months ago

I'd rather assume that classifier doesn't contain dots, but it may contain dashes - see https://repo1.maven.org/maven2/org/apache/activemq/activemq-karaf/5.18.2/ with classifiers like activemq-webconsole or features-core.

cstamas commented 11 months ago

Yes, if classifier contains dots, you are doomed :smile:

But, you KNOW what is prefix (artifactId + version), strip that off, what remains is (possibly) classifier and extension. For example, this is what happens in indexer (does NOT work if classifier contains dot): https://github.com/apache/maven-indexer/blob/master/search-backend-remoterepository/src/main/java/org/apache/maven/search/backend/remoterepository/extractor/ResponseExtractorSupport.java

grgrzybek commented 11 months ago

thanks for comment - Github is a small place ;)

the problem is how to split classifier + . + multidotted-extension...

cstamas commented 11 months ago

See the code i pasted (last method): first strip off known hashes, then strips off known prefix (artifactId + version) and then IF you assume classifier does not have dot (and maven would not allow it: ), everything "dotted" is extension. That code above works for cases like "tar.gz", "tar.bz2" etc.

Debug this UT https://github.com/apache/maven-indexer/blob/master/search-backend-remoterepository/src/test/java/org/apache/maven/search/backend/remoterepository/internal/RemoteRepositorySearchBackendImplTest.java#L166