dtrx-py / dtrx

Do The Right Extraction
GNU General Public License v3.0
224 stars 10 forks source link

.rar files depends on number of files #42

Closed petoor closed 1 year ago

petoor commented 1 year ago

Hello @noahp I have a ETL project where i use dtrx as backbone for unzipping a lot of different files. Sometimes thare are .rar files that fails. I have been able to reproduce it as a bare minimum.

Reproduction steps: Running the following python script. Should create 708 txt files. Create a folder called dtrx_test and create the following script.

# Dtrx works just fine.
for i in range(708):
    with open(f"{str(i)}.txt", "w") as f:
        f.write("this is a txt file")

# Dtrx doesnt work
# for i in range(709):
#     with open(f"{str(i)}.txt", "w") as f:
#         f.write("this is a txt file")

Compressing with 

rar a works.rar dtrx_test

Yields a rar file that dtrx can compress. Running the script again with 709 txt files creates a rar files that dtrx cant compress.

dtrx works.rar --noninteractive
dtrx does_not_work.rar --noninteractive

However, running.

unrar -o+ x works.rar
unrar -o+ x does_not_work.rar

Works for both containers, so the unrar library is able to handle it.

I've added both files a a zip file. (Running dtrx dtrx.zip --recursive --noninteractive should also fail for the zip file)

NB. Having subdirs e.g. dtrx_test/subdir/(alot of files) changes how many files is needed for the extraction to work, however i think fixing one problem fixes the other.

dtrx.zip

ChrisJefferson commented 1 year ago

Hi,

Just out of intreest, what output / behaviour do you get from "does_not_work.rar"?

petoor commented 1 year ago

Nothing. It just hangs

ChrisJefferson commented 1 year ago

Thanks, I've tracked down the problem (pondering how to fix it)