eclipse-platform / eclipse.platform

https://eclipse.dev/eclipse/
Eclipse Public License 2.0
83 stars 113 forks source link

Proposal: Opening/Closing Mechanism for Zip Files #1413

Open Michael5601 opened 5 months ago

Michael5601 commented 5 months ago

Proposal: Opening/Closing Mechanism for Zip Files

Background

The Eclipse IDE has no built in functionality to open Zip Files and read or manipulate their content. Because of this, other operations like searching inside of Zip Files or comparing two Zip Files were also not possible.

Goal:

This pull request introduces a mechanism for handling Zip Files within the Eclipse workspace, enhancing the functionality to read and write Zip files. The primary goal is to provide a seamless experience for developers working with zip archives directly within Eclipse.

Description:

Zip files must be opened manually within the workspace by using the new command "Open Zip File" in the menu when right clicking the zip file. It is also possible to open nested zip files.

Zip Files are opened by replacing the file in the workspace with a linked folder that reads and writes the Zip File in the file system. By closing the Zip FIle, the linked folder will be deleted and the file can be seen in the workspace again.

Please note that only ZIP Archives are supported in this current implementation. Other archive types can be added in future improvements. Also linked Zip Files can not be opened with this implementation because the Zip File must be local.

An additional PR for the repository eclipse.platform.ui that grants access to the open/close mechanism for zip files over UI can be found in the following:

https://github.com/eclipse-platform/eclipse.platform.ui/pull/1947

Co-Authored-By: CodeLtDave david.erdoes@vector.com

laeubi commented 5 months ago

Given there is a Zip Filesystem for local files in the JDK already that support read/write I wonder if one really needs the "Open Zip FIle" step? Also the examples contain such a ZipFile system for eclipse already.

So I wonder if you maybe can explain why this design decision was made with linking / extraction of the Zip?

github-actions[bot] commented 5 months ago

Test Results

0 files   -   586  0 suites   - 586   0s :stopwatch: - 29m 48s 0 tests  - 3 980  0 :white_check_mark:  - 3 934  0 :zzz:  - 46  0 :x: ±0  0 runs   - 4 179  0 :white_check_mark:  - 4 132  0 :zzz:  - 47  0 :x: ±0 

Results for commit ba2cef4f. ± Comparison against base commit 0b69e82f.

:recycle: This comment has been updated with latest results.

mickaelistria commented 5 months ago

I think before diving in the code, @laeubi 's comment at https://github.com/eclipse-platform/eclipse.platform/pull/1413#issuecomment-2157982423 is most important to answer. IMO, if we're to include a Zip filesystem in Platform, it'd better be the one that is already in the example. UI also could be built on top of this filesystem in example. One benefit is that we could make progress with UI without being blocked by Platform.

laeubi commented 5 months ago

One benefit is that we could make progress with UI without being blocked by Platform.

Honestly I think we don't need any new UI but it seems I have missed the discussion about the requirements for this change, but take as an example the Classpath Container, if there is a jar file (what actually is a zip) it simply has a handle so I can expand it and it shows the content and I can double click files and open editors and so on, I would expect something similar for "something" that support open zip files without any need to open/close/extract/... UI because that's the whole point of a file-system abstraction that you can have "virtual" items pointing to something that is not a regular file.

laeubi commented 5 months ago

eclipsefdn/eca succeeds so I don't see an immediate issue here: grafik

iloveeclipse commented 5 months ago

eclipsefdn/eca succeeds so I don't see an immediate issue here:

OK. Haven't noticed the check was done for the other person as the PR owner. Looks weird however to see only one author checked. What about @Michael5601 ECA agreement then?

mickaelistria commented 5 months ago

Can someone (a project lead?) gives some opinions about the fact that we already have code for it in the codebase? IMO, this particular PR is not necessarily and we should only discuss whether we want to include the existing zip filesystem implementation in the main feature or not.

laeubi commented 5 months ago

Looks weird however to see only one author checked. What about @Michael5601 ECA agreement then?

I don't know the details @waynebeaton maybe can tell or we need a servicedesk ticket to clarify what / when is checked...

CodeLtDave commented 5 months ago

Given there is a Zip Filesystem for local files in the JDK already that support read/write I wonder if one really needs the "Open Zip FIle" step? Also the examples contain such a ZipFile system for eclipse already.

I appreciate everyone's interest in our project. A frequent topic of discussion has been the existing zip example provided in Eclipse. I want to clarify that our pull request builds upon this example, adding more functionality and integrating it into the main feature.

As @mickaelistria mentioned, the key question is whether this functionality should be part of the main feature, and I firmly believe it should be.

Regarding the nio Zip Filesystem available in the JDK, we utilize its capabilities extensively. Our pull request essentially incorporates the NIO Zip Filesystem into the existing example, resulting in significant performance improvements. Most of the logic can be found in the ZipFileStore.java where the nio Zip Filesystem is opened.

mickaelistria commented 5 months ago

I want to clarify that our pull request builds upon this example, adding more functionality and integrating it into the main feature.

OK, great. Thank you for clarifying! Do you think it would be possible for you to create a dedicated PR for those improvements and additional tests, but keeping the code in the "example" bundle for the moment? Such a PR could be merged in a breeze and be a valuable iteration while we still discuss the best place to host the filesystem in the end.

laeubi commented 5 months ago

@CodeLtDave can you explain why explicitly Open/Close is required? Can't we reuse the existing functionality (and possibly extend if something is missing here)?

What we already have

To make it more clear I just choose this random project with some java files and it looks like this grafik

Now I expand the handle and it looks like this: grafik

So I have a File (.java) and it shows we there is maybe more to explore and after that I see some more items that are "inside" the file...

As a user I would expect same/similar if it was not a java but a Zip-File for browsing the content similar to what I have for example with Plugin-Dependencies:

grafik

What would be new

Now if I open one file in an editor, I would like to be able to save this (and maybe getting asked if I really want to update the zip archive).

Next when I choose the File Search Dialog I would maybe expect some new option to include searching inside archives as well:

grafik

Conclusion

So to summarize I don't see we really need any new Actions for that (load/unload) but should use whats already is offered here. I haven't checked in detail how these functionality is implemented so maybe there are good reasons not to do it like this... but it would better match user expectation I think.

CodeLtDave commented 5 months ago

So I wonder if you maybe can explain why this design decision was made with linking / extraction of the Zip?

Regarding the functionality of the ZipFilesystem, as implemented in the example, we don't extract any files for opening or editing. Opening the Zip is akin to walking down the File Tree and collecting all necessary file information.

When a file is modified, it's done by opening the Zip with NIO and using the respective operations.

Based on our testing, this approach is much more lightweight and performant than always extracting the entire Zip and repackaging it. For small changes, this saves significant time and reduces overhead. We didn't investigate the internal workings of NIO in detail, so it's possible that some extraction and repackaging might be happening in the background. However, since it's much faster than manual extraction and repackaging, we believe this is not the case.

Regarding the approach with a linked resource, this seems to be the best method for linking any virtual FileSystem. Since it was already implemented this way in the example project, we believe it is appropriate and fitting for our use case.

jukzi commented 5 months ago

I wish ZIP is support from core. Currently i have to go windows file explorer to open zip files. awkward. Searching in zip files would be even better, but that can be a follow up step. At best JAR files are handled as ZIP files as well, because technically they are. The file ending is rather irrelevant.

laeubi commented 5 months ago

I think then this description is confusing:

Zip Files are opened by replacing the file in the workspace with a linked folder that reads and writes the Zip File in the file system. By closing the Zip FIle, the linked folder will be deleted and the file can be seen in the workspace again.

Why is it a folder then? (see example with java file that is not a folder at all)? As far as I understand the example it is done that way so you can link in any external zip file but here we want to show additional content for a file inside the workbench so this indirection should not be required (see java files and classpath jar files).

CodeLtDave commented 5 months ago

I wish ZIP is support from core. Currently i have to go windows file explorer to open zip files. awkward. Searching in zip files would be even better, but that can be a follow up step. At best JAR files are handled as ZIP files as well, because technically they are. The file ending is rather irrelevant.

@jukzi

The use case for searching in ZIP files is honestly the motivation for this project. Many colleagues, including myself, have often expressed frustration over this limitation. As you mentioned, JAR files can only be differentiated from ZIP files by their structure, specifically the presence of a META-INF/MANIFEST.MF file and possibly a Main-Class attribute.

Given that this is the platform repository, we have manually disabled support for JAR files. We plan to add JAR support, along with all the necessary functionality and tests, in the JDT (Java Development Tools) as suggested by @HeikoKlare.

CodeLtDave commented 5 months ago

I think then this description is confusing:

Zip Files are opened by replacing the file in the workspace with a linked folder that reads and writes the Zip File in the file system. By closing the Zip FIle, the linked folder will be deleted and the file can be seen in the workspace again.

I think you are right, that sounds confusing. Essentially, the zip file is hidden as long as there is a linked folder (or opened Zip) accessing the zip file. When it's closed, the link is deleted and the zip file is visible again.

Why is it a folder then? (see example with java file that is not a folder at all)? As far as I understand the example it is done that way so you can link in any external zip file but here we want to show additional content for a file inside the workbench so this indirection should not be required (see java files and classpath jar files).

I think I misunderstand your question. Every Zip/JAR is a compressed folder and is therefore displayed as such. We didn't change anything about that part from the example.

CodeLtDave commented 5 months ago

I want to clarify that our pull request builds upon this example, adding more functionality and integrating it into the main feature.

OK, great. Thank you for clarifying! Do you think it would be possible for you to create a dedicated PR for those improvements and additional tests, but keeping the code in the "example" bundle for the moment? Such a PR could be merged in a breeze and be a valuable iteration while we still discuss the best place to host the filesystem in the end.

Thank you for the input. I agree that splitting the addition of functionality and moving the example into two separate PRs could be beneficial. I will discuss this approach with my team. Considering the size of this PR and the necessary changes based on your comments, we will need some time to implement this.

laeubi commented 5 months ago

JAR files can only be differentiated from ZIP files by their structure, specifically the presence of a META-INF/MANIFEST.MF file and possibly a Main-Class attribute

Jar files are Zip files, a manifest is not mandatory nor any special headers: https://docs.oracle.com/en/java/javase/20/docs/specs/jar/jar.html#introduction

A JAR file is essentially a zip file that contains an optional META-INF directory

So I don't see why one should / would disable anything for them?

We didn't change anything about that part from the example.

The example is for linking a filesystem as a new item into the workbench, here we want to show additional childs for an existing file in the workbench. That's an important part I think and would avoid having to "shadow" an existing file, maybe such special resource must implement both IFolder and IFile as otherwise other parts of the system can no longer "see" the zip/jar as a file what could have some implications.

CodeLtDave commented 5 months ago

So to summarize I don't see we really need any new Actions for that (load/unload) but should use whats already is offered here. I haven't checked in detail how these functionality is implemented so maybe there are good reasons not to do it like this... but it would better match user expectation I think.

@laeubi

We share your opinion and are also not fully satisfied with the current UI usability for the ZipFileSystem. We have developed a version with a double-click handler. For now, we are relying on the implementation from the example. Implementing the existing UI features has proven to be very challenging, as the UI was not easily extensible for our needs. Since I was not involved in the UI development, I will try to put more focus on this aspect.

Given these challenges, this PR should perhaps be considered a draft. Our main goal is to gather feedback, such as yours, and discuss the possibility of moving the ZipFs into the main project.

laeubi commented 5 months ago

I think moving the bare implementation (== File system provider) into core should be something that is quite easy, as it would just offer the new FS type but not changes anything.

mickaelistria commented 5 months ago

I think moving the bare implementation (== File system provider) into core should be something that is quite easy, as it would just offer the new FS type but not changes anything.

FWIW, I would also support this move of the plain filesystem at the moment. Everything consensual should be ideally provided separately and merged so we can then focus on the friction points. I'm more reluctant about the particular additions to IResource (which I believe look like a good starting point but too intrusive for a good final solution).

CodeLtDave commented 5 months ago

JAR files can only be differentiated from ZIP files by their structure, specifically the presence of a META-INF/MANIFEST.MF file and possibly a Main-Class attribute

Jar files are Zip files, a manifest is not mandatory nor any special headers: https://docs.oracle.com/en/java/javase/20/docs/specs/jar/jar.html#introduction

A JAR file is essentially a zip file that contains an optional META-INF directory

So I don't see why one should / would disable anything for them?

Since Eclipse is primarily used for Java projects, I understand your perspective. However, I still believe it is not suitable to include this in the platform repository. As mentioned, there are additional concerns for JAR files that require thorough testing. All this will be addressed in a planned PR for the JDT (Java Development Tools).

If you still believe that everyone using an Eclipse product should have the logic for JARs included, and if this is the consensus, we can reconsider. However, it is important to consider users who use Eclipse for projects other than Java. For example, similar logic could apply to other file types such as WAR, EAR, APK, NPK, XPI, CRX, or VSIX.

We didn't change anything about that part from the example.

The example is for linking a filesystem as a new item into the workbench, here we want to show additional childs for an existing file in the workbench. That's an important part I think and would avoid having to "shadow" an existing file, maybe such special resource must implement both IFolder and IFile as otherwise other parts of the system can no longer "see" the zip/jar as a file what could have some implications.

Without fundamental changes to many parts of the platform, achieving this would be challenging. I spent two weeks exploring this approach and welcome any solutions, but from what I have seen, it cannot be done without very intrusive changes. The hardest part is determining an entry point. I couldn't find a place where I can dictate that this file should be treated with the ZipFileSystem as long as it has a URI for the local filesystem. There are many differences regarding the functional conditions of a LocalFileSystem and a virtual ZipFileSystem.

My main concern is that our PR adds new functionality for the user. If we were to treat the ZIP as a combination of file and folder, the behavior of a ZIP file would definitely change. I believe it must be a conscious decision to change the behavior from a file that cannot be opened or read to a folder.

If you have concrete advice on how we can get around the Linked Folder construction please let me know.

jukzi commented 5 months ago

I have not followed and understand everything written, but please keep in mind that a zip file needs to stay a IFile so that it can be for exampled read bytewise to copy it into another IFolder. On the other hand a zip archive mounted into the workspace has to be a IFolder as it can and should contain IFiles as children. As a special case there can be a Zip in a Zip recursively and at best a search would open all zips recursively. Best wishes! For me the best usecase of a mounted zip would be to compile all .class files directly into a zip/jar, so that compilation does not need to open so many small OS files. - Which would be a big performance gain on Windows. At best the java launcher would understand the archive is an archive and can be handled by java as classpath .jar instead of a directory. In a perfect world my whole workspace could be a single zip on the OS file system, but i doubt tools like "java" would understand a jar inside a zip :-)

iloveeclipse commented 5 months ago

Given these challenges, this PR should perhaps be considered a draft.

Yes, please convert to a draft.

What I would recommend also is to provide a detailed explanation

Based on that we could discuss where it is all going.

Note from our application point of view: we have several custom zip based file formats (both encrypted/not encrypted) for which we have dedicated custom editors. So from our application point of view even if our files are zip files inside, no other editor should be opened by double click on them, no other editor should be used to modify them and no extra indexing/searching should be done for the content, especially because some of the files can be very huge (up to 4 GB compressed) and may contain many thousand of (plain text) child elements. So whatever is developed here should allow application deny any of the mentioned operations on specific file types.

Michael5601 commented 5 months ago

So to summarize I don't see we really need any new Actions for that (load/unload) but should use whats already is offered here. I haven't checked in detail how these functionality is implemented so maybe there are good reasons not to do it like this... but it would better match user expectation I think.

@laeubi

We share your opinion and are also not fully satisfied with the current UI usability for the ZipFileSystem. We have developed a version with a double-click handler. For now, we are relying on the implementation from the example. Implementing the existing UI features has proven to be very challenging, as the UI was not easily extensible for our needs. Since I was not involved in the UI development, I will try to put more focus on this aspect.

Given these challenges, this PR should perhaps be considered a draft. Our main goal is to gather feedback, such as yours, and discuss the possibility of moving the ZipFs into the main project.

Thank you for your valuable input. We appreciate every suggestion that helps us improve this PR. I'd like to address some additional questions you raised that @CodeLtDave did not cover.

Initially, we considered using the existing implementation for "opening" files, as you suggested. However, we encountered a challenge: this functionality is JDT-based rather than Platform-based (Please correct me if I am wrong). As noted in the example project referenced several times in our discussion, it also does not utilize this functionality. Instead, it hides the zip files in the workspace behind linked folders, which can be expanded to view the zip file contents. These folders only appear after manually pressing a button in the UI.

In the long term I think we need to implement the JDT based approach and also the Platform based approach because these are different domains.

The biggest part of our research in the beginning of our project was to find a satisfying way to include files in the workspace that can be opened like folders. Every attempt failed except for the one we provided in this PR which is the same as in the example project. This implementation lead to more problems regarding linked resources but we managed to fix many of them.

In the PR for platform.ui I provided some screenshots for the workflow so it can be understood better.

Lastly, I want to mention that the file search can immediately search inside zip files if they are manually opened before initiating the search. This does not require additional functionality since all the files are located in the workspace. The filter button you mentioned will be added in a future improvement, but there are still some considerations regarding the UI and future implementations. Our team has already discussed this and decided not to include a filter in the first version.

I hope I could clarify some questions and looking forward to additional input.

Michael5601 commented 5 months ago

Given these challenges, this PR should perhaps be considered a draft.

Yes, please convert to a draft.

What I would recommend also is to provide a detailed explanation

  • what exactly is the main goal of the change
  • how the proposed solution works technically (highlighting most significant design decisions)
  • how the proposed solution works from user point of view (highlighting most significant UI behavior changes)
  • which other possible solutions were evaluated
  • pros/cons of current/evaluated solutions from technical and user point of view.

Based on that we could discuss where it is all going.

Note from our application point of view: we have several custom zip based file formats (both encrypted/not encrypted) for which we have dedicated custom editors. So from our application point of view even if our files are zip files inside, no other editor should be opened by double click on them, no other editor should be used to modify them and no extra indexing/searching should be done for the content, especially because some of the files can be very huge (up to 4 GB compressed) and may contain many thousand of (plain text) child elements. So whatever is developed here should allow application deny any of the mentioned operations on specific file types.

We will provide a detailed explanation addressing the open discussion points you mentioned. I hope this will clarify many of the issues raised in the discussion. I converted the PR to a draft.

waynebeaton commented 5 months ago

I don't know the details @waynebeaton maybe can tell or we need a servicedesk ticket to clarify what / when is checked...

Both contributors appear to have signed the ECA, so you should be good to go.

The ECA checker only checks the actual author of the commit. It does not check Also-Authored-By fields and similar.

I'll use this opportunity to remind everybody that the ECA checker is just a tool and that its purpose is to help committers understand when the contributors are covered by ECAs. No tool is perfect and we depend on committers to be our first line of defence in maintaining our IP regime. If you feel that something is out of place or missing, please engage with helpdesk (or EMO) so that we can help.

Finally... what the heck is the gender of a ZIP file?

jukzi commented 5 months ago

I just became aware that there is an existing https://github.com/bndtools/bnd/blob/master/bndtools.jareditor/src/bndtools/jareditor/internal/JarFileSystem.java which is also used in the eclipse IDE.

Michael5601 commented 5 months ago

I don't know the details @waynebeaton maybe can tell or we need a servicedesk ticket to clarify what / when is checked...

Both contributors appear to have signed the ECA, so you should be good to go.

The ECA checker only checks the actual author of the commit. It does not check Also-Authored-By fields and similar.

I'll use this opportunity to remind everybody that the ECA checker is just a tool and that its purpose is to help committers understand when the contributors are covered by ECAs. No tool is perfect and we depend on committers to be our first line of defence in maintaining our IP regime. If you feel that something is out of place or missing, please engage with helpdesk (or EMO) so that we can help.

Finally... what the heck is the gender of a ZIP file?

Thank you for the clarification. I used the term "gender" for zip files in commit f092e33aed648bd2b3329af3a4c0e39de1953d4a because of an issue with the synchronizeGender() method in RefreshLocalVisitor. This method checks if a folder in the workspace is also a folder in the file system. For zip files, we use linked folders in the workspace, but the corresponding resource in the file system remains a file, causing synchronization problems in synchronizeGender(). Therefore, in this context, "gender" refers to whether the resource is a folder or a file.

The provided workaround is temporary, and we are working on a better solution.

jukzi commented 5 months ago

For curiosity "gender" is used in various ancient code image

merks commented 5 months ago

I just became aware that there is an existing https://github.com/bndtools/bnd/blob/master/bndtools.jareditor/src/bndtools/jareditor/internal/JarFileSystem.java which is also used in the eclipse IDE.

PDE makes that library available:

image