apache / lucene-jira-archive

Jira archive for Apache Lucene
https://lucene.apache.org/
2 stars 6 forks source link

Consider spreading attachment folders to subfolders to avoid 10000+ folders under a single root #137

Open vlsi opened 2 years ago

vlsi commented 2 years ago

It looks like the current strategy is to have the structure like $JIRA_ID/..., however, you might have many issues, so it would make sense to spread the folders like $HASH/$JIRA_ID/..., so subfolders never get long. HASH could be the last two digits of the JIRA issue.

It would make manual navigation easier.

See https://github.com/vlsi/tmp-jmeter-attachments

dweiss commented 2 years ago

Do we care though? It's not like this is going to be checked out by anybody - these are just files served if people click on a link somewhere. And for unix systems the number of files hardly makes a big difference, I think?

vlsi commented 2 years ago

The change is trivial, and it would ease cases like "open repo in GitHub UI"

vlsi commented 2 years ago

Here's a recent "big folder causes sloness" issue in Ant: https://bz.apache.org/bugzilla/show_bug.cgi?id=66048

dweiss commented 2 years ago

The change is trivial but it leads to less intuitive final URLs. I read the ant issue and like I said - I don't think people will ever clone the migrated attachments repository - why would anybody? I think it's better to have more intuitive attachment URLs than yield to potential needs that are somewhat esoteric.

vlsi commented 2 years ago

Note that when https://github.com/apache/lucene-jira-archive/issues/127 is implemented, the URLs become far from human-readable.

markup:

<img src='https://vlsi.github.io/tmp-jmeter-attachments/48/42248/31902-undo.png'>

rendering:

Actual URL in the GitHub UI (just inspect the image above): https://camo.githubusercontent.com/6879b0ddd141b7fe2856a5d1c0a77d4495c3f0e48ae2e4f18ad3d21f20c40668/68747470733a2f2f766c73692e6769746875622e696f2f746d702d6a6d657465722d6174746163686d656e74732f34382f34323234382f33313930322d756e646f2e706e67

dweiss commented 2 years ago

Ok, fair enough. But for non-inlined links we'd still show the un-obfuscated URL, right? I honestly don't think the number of files in a folder matters much here but feel free to do as you wish.

mocobeta commented 2 years ago

We have other tasks and realistically speaking, there is only one developer who can work on this issue (me). I'd like to save implementation effort if the current directory structure does not cause any practical issues - could you please tell us what is the exact problem here? I don't go against the suggestion, I'm just saying the priority is low to me unless it's a really critical problem or other people want to pick this.

vlsi commented 2 years ago

GitHub limits the listing to 1000 entries only: https://github.com/DefinitelyTyped/DefinitelyTyped/tree/master/types