adamhathcock / sharpcompress

SharpCompress is a fully managed C# library to deal with many compression types and formats.
MIT License
2.25k stars 479 forks source link

lzma/7z implementation #91

Closed weltkante closed 8 years ago

weltkante commented 9 years ago

Hi, I'm the author of the managed-lzma project and noticed you have some branches integrating parts of my project (and are still linking to the deprecated google code page by the way). If there is any interest for better integration we could discuss how to organize possible contributions from my side. Feel free to discuss it here or contact me through the mail on my profile.

PS: right now I'm working on a cleaned up public LZMA/7z API replacing the experimental API mess in my current source, it should be ready in 1-2 weeks.

adamhathcock commented 9 years ago

I'm more than happy to have better integration with the work you're doing.

My issue is that I can't give this project the attention at the moment I used to between work and new children.

What did you have in mind? Would you like to submit pull requests with updated code and usage? Something else? I'm open to more things and anything :)

You're welcome to do as much as you'd like. Currently, I'm really just stewarding things as people submit fixes or do small fixes myself.

weltkante commented 9 years ago

Sure, I can do pull requests if thats what works best. I don't really have any experience with keeping multiple github projects in sync.

adamhathcock commented 9 years ago

I think this a good link to explain fork and pull request workflows with github: http://blog.scottlowe.org/2015/01/27/using-fork-branch-git-workflow/

If you think you'll really want to dig in and do more with Sharpcompress we can look at making an org with sharing ownership. Most people don't want to do that :)

weltkante commented 9 years ago

Hm as far as understand that it is just about working on a single project, not keeping distinct projects in sync. I can't really do pull requests from my project into this, just from a fork back to the master, right? So if you want a copy of the lzma/7z core within SharpCompress (instead of referncing a nuget package or something) there needs to be a way to get changes from my project to this project (not necessarily automated).

Anyways, assuming I should do pull requests, which branch would I do that against? Codeplex seems to have two branches (managedlzma and new7zipformat) and github has new_7zip. Didn't look too closely into the differences yet. Master seems to still be based on the old lzma-sdk implementation.

adamhathcock commented 9 years ago

Ah yes. We can't go between the two repos. I could make it a nuget depedency but I like that there are currently no dependencies.

If you were to basically copy/paste, just do pull requests against master.

weltkante commented 9 years ago

Sure, that's fine with me. Should I start the integration from scratch or look into your existing work integrating my library, as a reference of how it should be done as far as SharpCompress is concerned? If so, which branch should I take a look at? Are all 3 still relevant?

adamhathcock commented 9 years ago

Hopefully you don't have to modify much to get the your new work integrated. The current usage in Sharpcompress should be close to what you need.

The branches are probably pretty old and out of date.

weltkante commented 8 years ago

I've taken a look at the repository, and it seems to be a mix of contributions from different sources. It might be interesting to clarify where the different components originate from.

From a quick look I guess that the LZMA implementation comes from the lzma sdk, which is bad, because it hasn't been maintained for years (it was a one-time port and has never been updated). Even worse, from what I can tell it has been ported from the java code in the lzma sdk, and as such may have bugs regarding signedness of variables (java doesn't know unsigned numbers).

When I played with the C# port in the lzma sdk its encoder produced results which were inconsistent with the C/C++ version so I consider it highly likely that there are bugs in that port. Can't really tell though if these are critical in the sense that they could produce unreadable data, or if they just produce a different encoding describing the same data.

Then there is some LZMA 2 code and (parts of) a 7z reader which I don't recognize where they could come from. And of course parts of my codebase :-)

Right now I'm concentrating my time on writing a clean implementation for 7z archives in my repository (new-api branch); once that's done I'll probably do a pull request to integrate the work into SharpCompress.

In a second step I would, if its ok with you, start replacing the LZMA encoders. In the past I've spent a lot of time in my project to come up with a test system that checks binary-identical output between my implementation and the C/C++ reference code (by running both algorithms in lock-step; even found a few bugs in my port this way). When writing lzma/7z archives it would be much more reassuring if the lzma encoders wouldn't produce different output than the C/C++ version, because as long as they do you can never be sure that your implementation will always produce compatible output.

adamhathcock commented 8 years ago

I will admit the LZMA and 7Zip stuff is jury rigged :) I avoided getting deep into binary algorithms as much as possible.

Anything you do can only improve the code.

weltkante commented 8 years ago

Sorry for the long silence, took me much longer to rewrite my 7z implementation than anticipated ;-)

I've now got my side working, next year I'll start bringing some of the work over here.

adamhathcock commented 8 years ago

Sounds good, thanks. No rush as I'm not exactly active here either.

weltkante commented 8 years ago

I'm really sorry to say that unexpected changes in my life schedule make it impossible to provide the contribution I originally wanted when opening the issue, as such I'm closing it for now. Sorry for waiting so long but I hoped to get back to coding, but unfortunately I don't have the time anymore required to clean up, debug or port a full implementation.

If anyone else wants to make an attempt at improving the implementation, my own library will of course stay available and can be used as a reference (or even copied into sharpcompress where applicable).

If there is need for advice, discussion or debugging I may still be able to help out, so feel free to attempt to ping me in these cases.

adamhathcock commented 8 years ago

Thanks. I completely understand and I have my own issues with working on this too.