Casvt / Kapowarr

Kapowarr is a software to build and manage a comic book library, fitting in the *arr suite of software.
https://casvt.github.io/Kapowarr/
GNU General Public License v3.0
348 stars 12 forks source link

Downloads hang on importing #148

Closed PhAzE-Variance closed 2 months ago

PhAzE-Variance commented 2 months ago

Description of the bug

The application will locate a file or group of files and seem to work when downloading to 100%, then it will stall out in the importing step sometimes. A restart of the app seems to allow it to move on to the next file but the stalled file fails to import. Issue appears to be with the new conversion process based on the log file error.

To Reproduce

Expected behaviour

I expect downloads to complete, and import or errors to show in the log. It does appear to be logging a conversion error as posted below.

Screenshots

N/A

Version info

Kapowarr: v1.0.0-beta-4 Python: 3.8.17.final.0 DB version: 14

Running as Docker on Unraid 6.12.8

Additional context

Exception in thread Download Handler: Traceback (most recent call last): File "/usr/local/lib/python3.8/threading.py", line 932, in _bootstrap_inner self.run() File "/usr/local/lib/python3.8/threading.py", line 870, in run self._target(*self._args, **self._kwargs) File "/app/backend/download_queue.py", line 110, in __run_download PostProcesser.success(download) File "/app/backend/post_processing.py", line 240, in success cls._run_actions(cls.actions_success, download) File "/app/backend/post_processing.py", line 234, in _run_actions action(download) File "/app/backend/post_processing.py", line 106, in convert_file mass_convert( File "/app/backend/conversion.py", line 263, in mass_convert convert_file( File "/app/backend/conversion.py", line 81, in convert_file return conversion_class.convert(file) File "/app/backend/converters.py", line 309, in convert rar_file = ZIPtoRAR.convert(file) File "/app/backend/converters.py", line 181, in convert volume_id: int = cursor.execute(""" TypeError: 'NoneType' object is not subscriptable

Casvt commented 2 months ago

Ah there's the error, couldn't believe there wouldn't be one.

The error suggests that the filename is invalid. What was the filename?

PhAzE-Variance commented 2 months ago

The Comic (2016) - Volume 05 - Issue 065 - 078.zip

(I changed the name for the "Comic")

PhAzE-Variance commented 2 months ago

Just a guess here, but it's basically running on Linux, is it possible the spaces are not being delimited properly?

Casvt commented 2 months ago

The following is happening:

  1. The file has completed downloading.
  2. The download is removed from the queue (that's why after restarting, you can continue).
  3. The file is moved to the volume folder.
  4. A scan is done for the volume, so that the file gets matched to its issues.
  5. The file is converted (in your case from zip to rar). This is where the error happens.

The error and it's context suggest that the file could not be traced back to the volume that it's for. This could have a few reasons, that we should explore:

  1. The filename is invalid, making it not match to the volume (that's why I asked for the filename). Because it's not matched, when we try to find the volume linked to the file, we of course get nothing.
  2. A bit more complicated: the moving of files on unraid happens asynchronous (that is the theory at least; I don't know), and the scanning is done before the file is actually present at it's new location. I hope very much that this is not the case because that's hell to fix, but we'll have to see.
  3. The filepaths in the database and in the variable are not the same. I doubt this, as this is only really a possibility on Windows (e.g. \path\to\file vs \\path\\to\\file).

So to start with: does the zip file still exist in the volume folder? If so, if you run a refresh & scan for the volume, is the file matched to it's issues (check mark next to the issues, click on issue and then 'files' tab to see if it's the zip file)?

PhAzE-Variance commented 2 months ago

Where is the volume folder?

The only actual configured settings in the docker are the: /app/temp-download /app/db /content

When the import gets stuck, there is no file in the temp-download directory anymore. Does it get moved to another location to unzip or does it unzip in that directory?

EDIT: Sorry, long day, the 'volume' folder for the comic. On unraid, this for me is in the /mnt/user/whatever share, which is set to push all changes to a cached drive. The app will not notice if this volume is created on a cached drive or the array itself because it's seamless from unraid.

The cache drive only dumps to the array once per day in my case, during the middle of the night, so its not unraid moving the file away.

PhAzE-Variance commented 2 months ago

Having inspected the volume folders for a few comics, i do see leftover files there which could be causing the issue. For example, this specific one that failed has 2 files in that volume:

The\ Comic\ (2016)\ -\ Volume\ 05\ -\ Issue\ 065\ -\ 078.html The\ Comic\ (2016)\ -\ Volume\ 05\ -\ Issue\ 065\ -\ 078.zip

So yes the file does still exist in the volume directory. Also, it might make sense to do this extraction and conversion process in the temp-downloads directory instead of the final volume because errors or problems will litter leftover files in places that will be harder to locate.

Casvt commented 2 months ago

Kapowarr downloading the HTML file is fixed already and the fix will be included in the next release. The zip file being in the volume folder is good. That means it was successful in moving the file. BTW, the volume folder is indeed the folder where the downloaded issues for a volume are stored.

So the file was moved. But a second later, it's not linked to the volume. Let's check if it will afterwards. In the web-ui, could you go to the relevant volume? Then, in the list of issues, could you check if the issues 65 to 78 have a check mark on the right? If not, do a 'Refresh & Scan', refresh the webpage and check again. The check mark ✅ (compared to a cross ❌) means that the issue is downloaded and matched to the volume.

That information will greatly narrow down where the problem lies.


Also, it might make sense to do this extraction and conversion process in the temp-downloads directory instead of the final volume because errors or problems will litter leftover files in places that will be harder to locate.

My software might have some bugs, but all desired behaviour is thought out. There's a decision behind each behaviour. The reason all the conversion is happening in the volume folder is because that is also where it will happen when the conversion is manually triggered. When you manually click the 'Convert' button in the tool bar and go for it, all the files are converted in the volume folder. It would be illogical to move them all to a different folder just to convert them and move them all back. Image if you have tens of gigabytes of files... So we do the same for downloaded media. We move them to the volume folder, then we tie into the conversion system to convert them like they would if they were manually selected to be converted. Doing it to have a more central place for leftover files is avoiding the problem instead of fixing it. There shouldn't be any leftover files, so if there are, the algorithm should just be fixed.

PhAzE-Variance commented 2 months ago

Yea, it looks like the files are checked, and extracted. (At this point, i have disabled the conversion option in settings). While the issues are checked, the original file still also exists in the volume folder.

So there are two things happening, if the convert option is enabled, it will download, extract, and fail to convert. If the convert option is disabled, it will download, extract, and fail to delete. Both scenarios end up with the download stuck at importing, preventing any more downloads from starting.

Im wondering if the filename or path variable that is being used is getting a null value somewhere after the extract command, and likely being passed to the convert routine and the delete routine with a null value. It doesn't always happen though, just most of the time. One scenario i have not tested is to disable the "Extract archives covering multiple issues" setting to see if it still happens.

PhAzE-Variance commented 2 months ago

I cleared out the volume folder entirely (because it had too much leftover files from previous attempts) and gave it another shot - it seems that the original file is left in the volume directory and the extraction does NOT occur, and the issues do have a check mark, and this is with the "Extract archives covering multiple issues" enabled and convert option disabled. If I enable the convert option, i get the error from the first post. Sorry, I know that probably makes this more convoluted.

Casvt commented 2 months ago

Okay I think I'm following. I already found a part of the problem, that is already solved and will be included in the next release.

In your version, after extracting the issues from the zip file, it will simply convert all files it will find in the volume folder. For the next release, that's fixed; not all the files in the folder will be converted, but only the files that actually came out of the zip file.

The reason that this breaks stuff for you is because you have unmapped files in your volume folder, like the HTML file and some other leftover files probably. They are attempted to be converted, even though they are not matched to the volume, and thus the error arises which I had already described to be raised for a file that is not matched to the volume.

That means that, in theory, if you only have files in the volume folder that are actually matched to the volume, the conversion should work fine. So only issue files, zip files for multiple issues, etc. and not html files and sorts.

However, you report that after you emptied the whole volume folder, it still raised an error when converting. Please check again what the contents of the folder is and especially if there are unmatched files inside, like html files which Kapowarr might've decided to download again.

PhAzE-Variance commented 2 months ago

You might be right. After clearing the directory again it worked and converted all as expected.

One thing to note is files that are redownloaded will populate a (1) or (2) version of the file instead of replacing the existing file. Looking forward to the beta 5 release otherwise.

Casvt commented 2 months ago

files that are redownloaded

Files shouldn't be redownloaded. Once an issue is marked as downloaded, Kapowarr shouldn't download for it again.

populate a (1) or (2) version of the file instead of replacing the existing file

If a file is downloaded and it is moved to it's destination, but there already exists a file with that name, then it should be replaced. In your version there is a bug which will delete the downloaded file in this scenario. So there shouldn't even be two files with the same name in your folder because the bug prohibits you from downloading the same file multiple times.

Looking forward to the beta 5 release otherwise.

Next release will be a stable release ;)