Nandaka / NijieDownloader

nijie.info downloader
http://nandaka.devnull.zone/
60 stars 8 forks source link

[1.0.1.0] Problem: batch downloading stucks for unknown reason #11

Closed reyaz006 closed 10 years ago

reyaz006 commented 10 years ago

Not really sure why this is happening.

I'm trying to batch dl by member id. On a first member it downloaded first image then it is stuck, not moving to next image. Re-adding work doesn't do anything, it says "Skipped / Running". Log file also doesn't report anything beyond that.

On a second member it was able to download 2 images, then it is stuck again.

If I try to Pause/Resume it just reports "Running", nothing really progresses. If I try to Stop, it says "Canceling" and Start not working anymore for this job (so I either have to re-add the job or restart the app to continue). Log file also not being updated with any errors or exceptions.

I also tried removing the Database file, it didn't help.

Today I'm getting quite slow connection to many foreign servers (nijie included), so I also tried using a proxy - this helped with the speed but didn't solve the problem.

Just in case, here is filename format I'm using: {memberId}\{imageId}_{page}_{imageTitle}_{serverFilename}

Nandaka commented 10 years ago

can you provide step-by-step to replicate this?

fyi, when you start the job, then add new one, it won't be auto started.

reyaz006 commented 10 years ago

From the start, with no settings file and database file:

Log file contains: 2014-03-08 13:01:14,467 INFO [ 1] - Upgrading configuration 2014-03-08 13:01:14,493 DEBUG [ 1] - Proxy= 2014-03-08 13:01:14,495 INFO [ 1] - Nijie Downloader v1.0.1.0 started. 2014-03-08 13:01:16,702 INFO [ 1] - Tracking 0 image(s) 2014-03-08 13:01:53,185 INFO [ 9] - Loggged Out 2014-03-08 13:01:57,413 INFO [ 9] - Logged In 2014-03-08 13:04:09,613 DEBUG [ 14] - Running Member Job: Member ID: 38 StartPage: 1 EndPage: 0 Limit: 0 2014-03-08 13:04:11,070 DEBUG [ 14] - Processing Image:75035 2014-03-08 13:04:12,449 DEBUG [ 14] - Downloading url: http://pic03.nijie.info/nijie_picture/38_20140301142431.png ==> <my root folder>\38\75035__陰毛の陰謀_38_20140301142431.png EOF. After I close the app, this is added: 2014-03-08 13:07:49,434 INFO [ 1] - Nijie Downloader v1.0.1.0 closed.

Also tried v1.0.0.2 now and it can't even save any file anymore. Log says something about System.Net.WebException and System.IO.IOException.

reyaz006 commented 10 years ago

Tried v1.0.2.0 - same thing.

Nandaka commented 10 years ago

are you using windows xp? Try to update the .Net Framework 4 (repair).

reyaz006 commented 10 years ago

I'm using Windows 8.1 x64. All .NET Frameworks are updated already, through the Update center. Control panel doesn't really show them in installed list so I see no option to repair anything. I'll test it under Windows XP VM later.

I wonder if this is caused by different deployement method. Could you do a test build with the default installation method?

Nandaka commented 10 years ago

I don't have the installer anymore, just extract the archive to any directory and ran it. Tested on Win7 64bit and XP SP3 32bit. I'll upload new version with more detailed logging later.

reyaz006 commented 10 years ago

That's how I (try to) use it since 1.0.1.0, there is no other way.

Tried it under my XP VM machine and it can't even run: System.Data.SqlServerCe.SqlCeException: The specified locale is not installed on this machine. Make sure you install the appropriate language pack. [ LCID = 1041 ]

I suppose it requires a Japanese language pack for .NET Framework which is not installed. Tried to install it from http://www.microsoft.com/ja-jp/download/details.aspx?id=23067 but it didn't help at all.

Nandaka commented 10 years ago

Try this patch: http://www.mediafire.com/download/44tl900u1azrh12/nijieDownloader.1.0.2.1.patch.7z Just overwrite the old files.

Regarding XP: you need to set the regional settings -> Language for non-Unicode to Japanese, I think, or at least need to install japanese language pack (so it can show japanese text).

Fyi, my Win7 is running on English.

reyaz006 commented 10 years ago

Thanks for this.

It now saves all the images, as far as I can see. Tested with member id 38. Here are brief log details happening while the app was running: Per item: 2014-03-17 16:19:41,486 DEBUG [ 4] - Downloading url: http://pic03.nijie.info/nijie_picture/38_20140312170650_0.png ==> D:\Temp\Nijie\38 - 76268 - チンポップ君の日常 公式漫画 殺し屋ンポ 暗殺.png 2014-03-17 16:19:44,550 ERROR [ 4] - Failed to save to DB: 76268 System.Data.Entity.Infrastructure.DbUpdateException: An error occurred while saving entities that do not expose foreign key properties for their relationships. The EntityEntries property will return null because a single entity cannot be identified as the source of the exception. Handling of exceptions while saving can be made easier by exposing foreign key properties in your entity types. See the InnerException for details. ---> System.Data.Entity.Core.UpdateException: An error occurred while updating the entries. See the inner exception for details. ---> System.Data.SqlServerCe.SqlCeException: An overflow occurred while converting to datetime. On the last image: 2014-03-17 16:20:49,795 DEBUG [ 4] - Downloading url: http://pic04.nijie.info/nijie_picture/38_20130829200015.png ==> D:\Temp\Nijie\38 - 58951 - ニジエたん ずらし挿入.png 2014-03-17 16:20:52,314 ERROR [ 4] - Failed to save to DB: 58951 System.Data.Entity.Infrastructure.DbUpdateException: An error occurred while updating the entries. See the inner exception for details. ---> System.Data.Entity.Core.UpdateException: An error occurred while updating the entries. See the inner exception for details. ---> System.Data.SqlServerCe.SqlCeException: A duplicate value cannot be inserted into a unique index. [ Table name = NijieTags,Constraint name = PK_dbo.NijieTags ] Same things are being logged if all files are already exist (if I run the same job again).

My Windows is russian, but from my experience it seems most environment problems are happening on non-developer-language OS. Non-english OS in this case. Same goes for dates/sizes/numbers formats - they may be different for different regions, e.g. if you try to process "1.05" number under OS where decimals are supposed to look like "1,05", you're gonna have a bad time.

Nandaka commented 10 years ago

This should not be the case, as I pass the DateTime type and the SQL CE is supporting native .Net type as far as I know.

Anyway, can you set the Trace DB = true in the Download Settings and the clear the log files, then run it again. I want to see what is the data they being generated.

Also, try to set the concurrent job = 1 and see if this was the cause the race condition when inserting the Tags.

Btw, I tested on Win7 English (no changes on the non-unicode) and WinXP set to Japanese for non-unicode.

reyaz006 commented 10 years ago

Concurrent job was already = 1. Like I said previously, with 1.0.2.0 I haven't changed any settings, even default download folder.

Enabled Trace DB and repeated member id = 38 batch. Here is the log file: http://pastebin.com/gKuMz1KT

Nandaka commented 10 years ago

checking the db trace, somehow the date is being parsed as min time.

Which is weird, on my Win7, it parsed correctly... https://github.com/Nandaka/NijieDownloader/blob/master/NijieDownloader.Library/Nijie.Image.cs#L136

or because of the extra tailing ==> 2014-03-12 17:06:54に投稿.

I'll fix it later :smile:

reyaz006 commented 10 years ago

Getting similar exceptions in 1.0.3.0, except it does them only for some images, and there is no pattern of its happening: System.Data.Entity.Infrastructure.DbUpdateException: An error occurred while saving entities that do not expose foreign key properties for their relationships. The EntityEntries property will return null because a single entity cannot be identified as the source of the exception. Handling of exceptions while saving can be made easier by exposing foreign key properties in your entity types. See the InnerException for details. ---> System.Data.Entity.Core.UpdateException: An error occurred while updating the entries. See the inner exception for details. ---> System.Data.SqlServerCe.SqlCeException: An overflow occurred while converting to datetime. Had 250 of these exceptions after finishing the job list of ~400 members (~20000 images). After running the same list again (took far less time because most images already downloaded) got only 2 of these exceptions. It didn't prevent the files from being correctly downloaded though.

Tested with Concurrent job = 5.

Nandaka commented 10 years ago
An overflow occurred while converting to datetime.

This one I'm not sure why it keep happening if using Concurrent Job > 1, I've checked the log file and it can parse and store the work date to the object, but somehow it try to save the date as DateTime.MIN_VALUE to the DB...

reyaz006 commented 10 years ago

May I ask what is the purpose of the database? It's under 200 kb for me, while my job list is huge. I though it's supposed to remember all downloaded images metadata so next time it knows what to skip. Even if you keep image ids as dwords, the file contains too many zero bytes.

Are you trying to remember server time for the images? In this case, are you sure this is needed? Since a server filename contains the actual upload date for each image, doesn't it mean that updated images would have different filenames too? Furthermore, I've checked and didn't find an option to edit published images. Member may delete an item or add tags only, it seems.

Nandaka commented 10 years ago

knows what to skip Yap (future). For now, it will show image already downloaded in 50% opacity.

Even if you keep image ids as dwords, the file contains too many zero bytes. it is the default format from MS SQL Compact, not mine. Both MemberId and ImageId is saved as Int.

server time for the images More to the work time when image is uploaded (it is the time on the image page), so I can easy detect if there is changes on the image if possible. Also maybe another user will ask for this feature (at least what I get from PixivUtil).

anyway, there is an option to turn off the db save on the settings -> downloads tab.

reyaz006 commented 10 years ago

Thanks for info.

Still, you may want to consider fixing the logic for detecting changes in images. If Nijie doesn't allow anyone to edit already uploaded files (that was my conclusion after checking), then it's useless.

I'll explain the database size thing I have: My Database.sdf size is 196608 bytes now. My batch list results in downloading of ~19000 images. After manually removing all zero [00] bytes from the file it reduced in size to 12682 effective bytes. This clearly means it doesn't hold the info about every downloaded image.

Checked with http://sourceforge.net/projects/compactview/ - my file only contains 1 image id and 1 member id with some info and urls, plus app-related info.

My guess is it isn't working as expected, or it only saves information and actual thumbnail data for tasks which I don't perform, like fetching images and member pages. I only use batch download for now.

Nandaka commented 10 years ago

I don't use the date for detect the changes for now, only the id.

Size thing I suspect this is to avoid fragmentation if you have a lot of data, I won't fix this. Also, I only save the info on db only when it successfully download the image from job.

Refer to:https://github.com/Nandaka/NijieDownloader/blob/master/NijieDownloader.UI/MainWindow.JobHelper.cs#L276

I still checking for why the value of DateTime is changed to DateTime.MinValue