DobyTang / LazyLibrarian

This project isn't finished yet. Goal is to create a SickBeard, CouchPotato, Headphones-like application for ebooks. Headphones is used as a base, so there are still a lot of references to it.
728 stars 72 forks source link

Import CSV now random #1572

Closed dkids closed 5 years ago

dkids commented 5 years ago

/usr/bin/curl -q "http://localhost:5299/api?apikey=API_KEY&cmd=importCSVwishlist&wait

I use a script to download the lists of NY Times bestsellers. It then splits the lists (using split -1) into files of one book, concatenates a header of "Title, Author" onto a CSV file, cats the file onto the CSV file, deletes the file.

I used to not split the lists but then I had to figure out which book was the problem. Sometimes it's easy, where an author's name has been butchered by my text processing. Or they use "F*ck" in the tile when the book spells out the word.

The way that this would work is that LL would process xaa.csv, xab.csv, etc. If there was a problem, LL would skip over that file and I could then examine the problem file to see what was up.

I suddenly have two problems: #1, LL gets stuck processing a file.

28-Sep-2018 12:02:06 - INFO :: API : csvfile.py:import_CSV:390 : Added 0 new authors, marked 0 books as 'Wanted', 1 book not found 28-Sep-2018 12:02:06 - WARNING :: API : csvfile.py:import_CSV:399 : Not deleting /Users/kids/Dropbox/lldata/xen.csv as not all books found 28-Sep-2018 12:02:12 - WARNING :: API : csvfile.py:import_CSV:377 : Skipping book POPULARMMOS PRESENTS A HOLE NEW WORLD by Pat, Failed to import 37557193 28-Sep-2018 12:02:12 - INFO :: API : csvfile.py:import_CSV:387 : Found 1 book in csv file, 0 already existing or wanted 28-Sep-2018 12:02:12 - INFO :: API : csvfile.py:import_CSV:390 : Added 0 new authors, marked 0 books as 'Wanted', 1 book not found 28-Sep-2018 12:02:12 - WARNING :: API : csvfile.py:import_CSV:399 : Not deleting /Users/kids/Dropbox/lldata/xen.csv as not all books found 28-Sep-2018 12:02:18 - WARNING :: API : csvfile.py:import_CSV:377 : Skipping book POPULARMMOS PRESENTS A HOLE NEW WORLD by Pat, Failed to import 37557193 28-Sep-2018 12:02:19 - INFO :: API : csvfile.py:import_CSV:387 : Found 1 book in csv file, 0 already existing or wanted 28-Sep-2018 12:02:19 - INFO :: API : csvfile.py:import_CSV:390 : Added 0 new authors, marked 0 books as 'Wanted', 1 book not found 28-Sep-2018 12:02:19 - WARNING :: API : csvfile.py:import_CSV:399 : Not deleting /Users/kids/Dropbox/lldata/xen.csv as not all books found 28-Sep-2018 12:02:25 - WARNING :: API : csvfile.py:import_CSV:377 : Skipping book POPULARMMOS PRESENTS A HOLE NEW WORLD by Pat, Failed to import 37557193 28-Sep-2018 12:02:25 - INFO :: API : csvfile.py:import_CSV:387 : Found 1 book in csv file, 0 already existing or wanted 28-Sep-2018 12:02:25 - INFO :: API : csvfile.py:import_CSV:390 : Added 0 new authors, marked 0 books as 'Wanted', 1 book not found 28-Sep-2018 12:02:25 - WARNING :: API : csvfile.py:import_CSV:399 : Not deleting /Users/kids/Dropbox/lldata/xen.csv as not all books found

And the other problem is it's picking files (apparently) at random. They're not the oldest files nor the newest files:

xaa.csv xaw.csv xbs.csv xco.csv xdp.csv xet.csv xgh.csv xhk.csv xit.csv xkg.csv xab.csv xax.csv xbt.csv xcp.csv xdq.csv xeu.csv xgi.csv xhp.csv xiu.csv xkh.csv xat.csv xbp.csv xcl.csv xdh.csv xeq.csv xfz.csv xhf.csv xiq.csv xjy.csv xlf.csv xau.csv xbq.csv xcm.csv xdi.csv xer.csv xgd.csv xhg.csv xir.csv xjz.csv xav.csv xbr.csv xcn.csv xdj.csv xes.csv xge.csv xhj.csv xis.csv xkf.csv

I'm fairly but not 100% confident that it used to do it in directory order.

philborman commented 5 years ago

Yes it is random, always has been. The csv importer only expects to find one csv file in the import directory, which it may delete after a successful import depending on config. It takes a list of files in the import folder and processes the first one it finds with a.csv extension. Depending on your os and the way it allocates directory entries with no sorting specified the order can be fairly random. Also If import fails we don't delete the csv so you can manually inspect it to see what went wrong. Maybe we should rename the failed file so we don't try it again, that way we could process all csv files in a folder, which is more like you are expecting.

For importing a list like NY Times best sellers you might be better using the wishlist method? If you put the url in as an rss feed lazylibrarian will import all the books it can find in the list, and then periodically check the list for changes and import any new ones. LL can do this from goodreads listopia feeds, see https://www.goodreads.com/list/tag/new-york-times If your feed is not listopia format or doesn't work, send me the url and I will take a look at it.

philborman commented 5 years ago

Added a dedicated wishlist type for NYTimes, urls like https://www.nytimes.com/books/best-sellers/hardcover-fiction can now be used as a feed