Nexus-Scraper to proyx pdrone maven repository

ctron / package-drone

An OSGI first software artifact repository – Moved to the Eclipse Foundation

https://github.com/eclipse/packagedrone

Eclipse Public License 1.0

39 stars 13 forks source link

Nexus-Scraper to proyx pdrone maven repository #64

Closed cornzy closed 9 years ago

cornzy commented 9 years ago

If you like to proxy a pdrone-mave-repository in a nexus it is not possible to browse the pdrone-repository. In nexus configuration (tab Routing) you get the message "No scraper was able to scrape remote (or remote prevents scraping)."

Nexus has a set of scrapers that looks for familliar index formats e.g. an index file from apache server would start with the text "Apache" :(

http://grepcode.com/file/repo1.maven.org/maven2/org.sonatype.nexus/nexus-core/2.11.0-02/org/sonatype/nexus/proxy/maven/routing/internal/scrape/HttpdIndexScraper.java#HttpdIndexScraper

Maybe pdrone could either fake a familliar index page or maybe it is possible to extend the nexus scrapers with own written "pdron-scraper".

ctron commented 9 years ago

Ok, this has to be implemented!

Of course I would like to see a "Package Drone" scraper, but I guess this is just a fantasy ;-)

I would prefer the Nexus scraper: http://grepcode.com/file/repo1.maven.org/maven2/org.sonatype.nexus/nexus-core/2.11.0-02/org/sonatype/nexus/proxy/maven/routing/internal/scrape/NexusScraper.java?av=f

It only requires the "presence" of this ".meta" file and not some http server parameter. And I guess there is no harm in just creating another XML file during the channel aggregation for maven.

cornzy commented 9 years ago

You're right the nexus scraper would be nicer. I thought it would be more complex.

ctron commented 9 years ago

I didn't fully check it, but from a first look is seems to be even easier. Since you don't have to fake a Server header. Both variants need to create some sort of content (Apache the index.html and Nexus the ".meta" file). A dedicated XML file seems better to me and there is no need to reply with an altered server name then.

On the other side there would be no "index.html" then, which would be interesting for browsing the channel. Maybe we should do both :wink:

But for the Scraper part, I tend to go with the Nexus Scraper.

ctron commented 9 years ago

So the next step would be to actually provide an index file. Which the nexus scraper still needs in addition.

ctron commented 9 years ago

Is there are way you can test this with a local development environment? I did make a test with Nexus OSS, and it shows the directory index. But I am not sure how I can do a real test.

cornzy commented 9 years ago

Maybe I could test tomorrow - if I'll find some time. If you have admin access to any nexus installation you could simply add a proxy repository to your local package drone - assumed that the nexus has network access to your computer.

ctron commented 9 years ago

Ok, well that is what I did. Set up an new Nexus (since we don't use it at all). Add a new proxy repo to a package drone channel. Check the "remote content" (I think it was) and saw the directory structure.

Is that it?

cornzy commented 9 years ago

Sounds good. I also just configured a repository and could browse the package drone repository in nexus. Also important indication is the tab "Routing" where you'll find the "Discovery Status": Successful.

For any reason my "Browse Index" is still empty. Maybe there is another issue to fix.

ctron commented 9 years ago

I recognized this as well. So there is something missing?

ctron commented 9 years ago

Ok, I just checked with the "Central" proxy repository in the Nexus default setup. Also there the "Browse Index" tab is empty.

cornzy commented 9 years ago

Then let's close this issue.

ctron commented 9 years ago

Ok!