bubonic / TGC.bundle

The Great Courses Agent for Plex
42 stars 7 forks source link

Agent stopped retrieving metadata #4

Closed sonnyvivanov closed 3 years ago

sonnyvivanov commented 3 years ago

Hi,

The agent stopped retrieving metadata when I tried today. Was working fine last week. I'm running Plex 1.21.1.3830 on my Synology NAS.

Embedding the tgc log. Thanks for looking into this.

com.plexapp.agents.tgc.log

purebloke commented 3 years ago

I'm having the same issue. It worked last week, but hasn't worked the last 3 days. I'm running Plex 1.21.1.3830 on a custom Windows 10 server.

bubonic commented 3 years ago

Thanks for letting me know. I'll take a look at it after the Holidays.

bubonic commented 3 years ago

From the nice log that @sonnyvivanov provided, it looks like the agent is no longer retrieving the lecturer names and quitting with an error. Could be that the TGC website changed their HTML and that the agent is getting a null result.

Like I said, I can fix this after the Holidays. Hold tight and thanks for the log.

purebloke commented 3 years ago

Thanks so much for taking the time. This is so incredibly helpful and greatly appreciated. Happy Holidays!

garbled1 commented 3 years ago

I tried to update to the latest version, but now I'm just getting:

2020-12-26 14:18:51,895 (7fba8f22ac00) : INFO (init:24) - Can't continue, need the dryscrape module to continue

Which I can't get installed because webkit is broken... and both it and dryscape are archived and dead code. :(

bubonic commented 3 years ago

@garbled1 Very true. I used dryscrape initially for parsing the TGC+ website, but discontinued that sometime ago and never removed the dependency. The main issue for TGC.bundle not retrieving metadata is it quits when trying to find data on the Professor/Lecturer. I took a minor look at the website last night and they are now using javascript for that part of the agent. So, when I find the old div/class with BeautifulSoup it returns a NoneType and thus the agent quits when trying to add it to the metadata.

There is a relatively new javascript renderer/scraper called requests that I'll be implementing in this agent. With any luck we should get the TGC+ metadata too, assuming all goes well. Like I said in an earlier post, hang tight and I should get everything back to normal around the same time next week as the time of this post.

willemsjawt commented 3 years ago

@bubonic You're the best, thanks man!

bubonic commented 3 years ago

Everything in the agent appears to be broken. The great courses website uses javascript that most renderers can't handle. I think the alternative is to use Selenium which would be a major rewrite of the code for the most part. Anyone willing to take this on has my support.

I'm working on a few other projects at the moment, but I will see what I can do with this agent to get it functioning again over time. No promises though!

Sorry all :(

purebloke commented 3 years ago

I understand, but so sad. It would be worth money to me to get it working if it were possible. $$$ donation your way if you can spare the time!

bubonic commented 3 years ago

I understand, but so sad. It would be worth money to me to get it working if it were possible. $$$ donation your way if you can spare the time!

Well, I did give up for a second. Then I took a walk around the block and as usual, I can't stand when stuff doesn't work when I know that it can. So, I sat back down and drafted up some quick code with Selenium and the headless geckodriver (i.e., Firefox):

options = Options()
profile = webdriver.FirefoxProfile()
profile.set_preference("javascript.enabled", True)
profile.set_preference("general.useragent.override", "Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Firefox/38.0")
profile.set_preference("headless", True)
options.headless = False
driver = webdriver.Firefox(profile, executable_path=os.path.abspath("/usr/local/bin/geckodriver"))

URL = "https://www.thegreatcourses.com/courses/language-and-the-mind"

#driver = webdriver.Chrome(executable_path=os.path.abspath("chromedriver"),   chrome_options=chrome_options) 
#driver.set_window_size(1120, 550)
driver.get(URL)
time.sleep(random.randint(10,22))
HTML = driver.page_source

soup = BeautifulSoup(HTML, features="html.parser")
ProfessorBlock = soup.find('div', {'class' : 'ProductPage-Professor-Info'})
print(ProfessorBlock.getText())
driver.quit()

and got the following:

bubonic@bubonicX230-SSD:~/eclipse-workspace/TGC.bundle$ ./testSoup.py 
Spencer Kelly, PhDLanguage is the ultimate human invention; the tool that makes all other tools.InstitutionColgate UniversityAlma materUniversity of ChicagoLearn More About This Professor
bubonic@bubonicX230-SSD:~/eclipse-workspace/TGC.bundle$ 

So, it looks like we are back in business. I've coded non-headless Selenium Instagram and Facebook bots for my company, so this should be straight forward. Of course, the only downside is that I have to implement random sleep times for the page to load so this will significantly slow the agent down and will require a few more packages on the user side to be operational. Hopefully all integrates into the PLEX environment too... we shall see. For now, I'm going to continue working on this and hope for the best. It will take sometime to rewrite all the requests and new soup declarations.

P.S. Donations are certainly welcomed, if you feel obliged. -bub

purebloke commented 3 years ago

That's fantastic and looks very promising. I'm looking forward to the final package. Certainly leave a link for a donation or PM me and I'll send something your way. We all benefit from valuable projects like this and we know that it takes time and effort to make them happen. Thank you!

sonnyvivanov commented 3 years ago

Thanks a lot @bubonic, hope you get it working again 🙏

Supertramp78 commented 3 years ago

Let me know how I can donate as well. This would be well worth it to me.

bubonic commented 3 years ago

I appreciate the support friends. Right now I'm error chasing to get all the components working in harmony. I'll post somewhere on how to donate once I get it working once more.

Thanks again.

Supertramp78 commented 3 years ago

Are you pulling data from the Great Courses site or the Great Courses Plus site? Reason I ask is I put (TGC####) after the name for all directories so the plugin can find it but there are GC Plus exclusives that don't seem to have course numbers and aren't on the GC site. Any suggestions for those or should I just do them manually? Not exactly sure what your program can do.

bubonic commented 3 years ago

For the courses, I am pulling from the regular site. TGC+ stopped working during the middle of last year. If all goes well, I should be able to pull data from both sites. For the most part, it will pull course numbers from the TGC site and maybe down the road I'll have it search TGC+ for the extra lectures.

Right now I'm just focusing on getting it working as before.

bubonic commented 3 years ago

good-news-everyone.jpg

I figured I would provide an update. A week ago I ran into a lot of errors trying to get the firefox/selenium/geckodriver stack to work. I had no ambition to combat and error chase anymore and halted until today. After much trial and error, I solved the correct version dependencies requirements - at least for the PLEX version I have installed on my server. I'll update to the latest version later this week and test to make sure.

Once I got the correct stack working, there were multiple shared libraries that will have to be bundled with the Agent. No extra work on the user of course. However, this is working for a headless Linux server using Xvfb. I don't have a windows computer that I can test this on, so when I update the repo, I will need feedback from my Windows users. I'm leaning towards it not working right away on Windows machines, but I could be wrong.

I successfully retrieved the Course Description from the new TGC website and updated it in PLEX - with unicode characters and much better formatting this time. It's only a matter now of rewriting the html parser (BeautifulSoup) conditions. Which doesn't sound bad, but there a lot of them and each step requires a Selenium .click() with a random wait to allow for page loads. There are a bunch of "clicks" to be made now as they have embedded the content in javascript.

I was basically correct when I said the Agent had to be totally rewritten. In the upcoming weeks, if time permits after work, I will build the necessary "clicks" and parsers to get back to a somewhat functioning agent that can be release on the repo for users to try and provide feedback.

I calculate about another two or three weeks (as I'm working on this in my sparse free time) to get back to having the same metadata we had before it broke. The good news is that a lot more error handling and cleaner code will be the resultant. However, it will take significantly longer to process the metadata for each course, on the lever of 4-6x longer.

TGC+ artwork and other metadata will be added down the road, but not in the first couple releases.

With that, we are on the road to recovery.

Best, -bub

purebloke commented 3 years ago

That IS good news! Thank you so much for all your work and efforts on this! I am looking forward to the release and will help and support in any way possible.

Supertramp78 commented 3 years ago

Same here. If the agent takes time, I don’t care. It have to be less time than me doing it by hand. And yes, I’ll chip in too as soon as I find out how.

On Mon, Jan 11, 2021 at 11:19 AM purebloke notifications@github.com wrote:

That IS good news! Thank you so much for all your work and efforts on this! I am looking forward to the release and will help and support in any way possible.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/bubonic/TGC.bundle/issues/4#issuecomment-758098675, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANVWQ4HKYL4D5BDQ7RFNG5TSZMXLXANCNFSM4VIQIOLQ .

sonnyvivanov commented 3 years ago

Sounds great, Bub, thanks 👍

willemsjawt commented 3 years ago

Hai @bubonic,

How is the project going?

If you instruct me a bit I would love to help you with the .click in Selenium.

❤️

bubonic commented 3 years ago

Hai @bubonic,

How is the project going?

If you instruct me a bit I would love to help you with the .click in Selenium.

❤️

What OS would you be running this on? I have most the selenium driver code written and it's retrieving most of the metadata; however, it doesn't seem to be updating it in PLEX and I'm unsure as to why.

I'm going to try this on another sever and see if that works. If you would like to beta test and possibly contribute let me know and I'll update the git repo. Right now, I think it only works on Linux... Might work on windows, I dunno though. Some OS checks need to be put in place to ensure correct code running.

Also, I need to write a Search function to find courses that don't have a direct link from my 'url guesser' code.

*Edit: it was updating PLEX at first, now it seems to have stopped and it's really bugging me

bubonic commented 3 years ago

Got it. Having an issue with poster art causes no metadata to load. Once I figure out loading the poster art, I will update repo for all to try.

hint I think you all will like what I did for the poster art.

Supertramp78 commented 3 years ago

Kind of curious about art since PLEX only wants vertical posters while TTC seems to only have landscape art these days.

On Wed, Jan 20, 2021 at 11:32 PM bubonic notifications@github.com wrote:

Got it. Having an issue with poster art causes no metadata to load. Once I figure out loading the poster art, I will update repo for all to try.

hint I think you all will like what I did for the poster art.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/bubonic/TGC.bundle/issues/4#issuecomment-764385218, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANVWQ4FYJAYLNO2EEM3WKFTS2635HANCNFSM4VIQIOLQ .

bubonic commented 3 years ago

Just got it working. Still a few things I need to do so the agent doesn't, hmmm, how do I say this,... Freeze your box after updating several courses due to selenium and multiple firefox instances.

preview

preview.png

preview2.png

I do some image processing in the agent to create a nice poster locally.

willemsjawt commented 3 years ago

@bubonicmailto:notifications@github.com Windows. And yes I would love to test out.

You’re the best! Really appreciate the work you doing and don’t forget that donation link. 😉

J.A.W.T. Willems


From: bubonic notifications@github.com Sent: Thursday, January 21, 2021 7:36:22 AM To: bubonic/TGC.bundle TGC.bundle@noreply.github.com Cc: Jasper Willems willems.jawt@gmail.com; Comment comment@noreply.github.com Subject: Re: [bubonic/TGC.bundle] Agent stopped retrieving metadata (#4)

Just got it working. Still a few things I need to do so the agent doesn't, hmmm, how do I say this,... Freeze your box after updating several courses due to selenium and multiple firefox instances.

preview

[preview.png]https://user-images.githubusercontent.com/15696467/105301101-f9d38180-5b88-11eb-8763-3f0d845c8cff.png

[preview2.png]https://user-images.githubusercontent.com/15696467/105301249-02c45300-5b89-11eb-97e7-26465836a750.png

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/bubonic/TGC.bundle/issues/4#issuecomment-764416586, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADI6ANMN5YGLFNOGVD7L73TS27DONANCNFSM4VIQIOLQ.

bubonic commented 3 years ago

Alrighty then. I'll be updating the repo Friday or Saturday with a working model.

I got the Firefox instances down one with multiple tabs per course update and tabs close after serialization to ensure low memory usage and unlikely to freeze your box. Also, there is OS and dependency checks and I have a feeling it will work with Windows boxes. So all my Windows users I'll be looking to you for feedback.

The updated repo will be without the Searchcourse() function for a short while. As long as you have your courses named exactly like the course url on TGC website you'll be fine. I know @Supertramp78 does that and we should really all follow his example. I do plan on implementing a very accurate Search function based on the Levenstein algorithm that I've used elsewhere. But, if any of you know the pain in creating search functions you'll know why I'm postponing that for a week or slightly longer.

All that is left to do is an updated README and a search function and TGC+ in future updates. I will post donation links in the README when I update the repo this weekend.

PREVIEW

Screenshot_20210122-023937.png

Screenshot_20210122-025353.png

Screenshot_20210122-025400.png

Screenshot_20210122-025418.png

purebloke commented 3 years ago

That's great! I can't wait to try it out. Again, our sincere thanks for pushing forward on getting this working! I'll be sure to give feedback on the Windows platform.

Supertramp78 commented 3 years ago

for clarification regarding search, are you saying the (TGC37737) addition to the folder name won't do any good for now? Just go back to mirroring the URL?

Supertramp78 commented 3 years ago

For those who don't know what the URL naming method is, this is what I wrote almost four years ago!

The files themselves can be named: S01E01 - text S01E02 - text You don’t need the name of the lecture in front of the S0xE0x and once you see below you will be glad.

You DO need the name of the lecture as the folder name and it needs to be the exact text as it reads in the URL, not the lecture name. For example…

African Experience from “Lucy” to Mandela - won’t work African Experience from Lucy to Mandela - won’t work African Experience from quot Lucy quot to Mandela - works just fine. This is how the lecture is written out in the URL.

http://www.thegreatcourses.com/courses/african-experience-from-quot-lucy-quot-to-mandela.html

Other examples include: Experiencing America A Smithsonian Tour through American History - won’t work despite the fact that it is the title Experiencing America A Smithsonian Tour through History - does work because that is what the URL is.

bubonic commented 3 years ago

for clarification regarding search, are you saying the (TGC37737) addition to the folder name won't do any good for now? Just go back to mirroring the URL?

Having the TGC37737 will be very effective when the SearchCourse() function is implemented. I suggest following the URL name and adding the (TGC####) to each directory/file as well. The agent will parse out the (TGC####) anyway for later implmentation so having it will not hurt at all, but will help in the future.

Thanks for the clarification

bubonic commented 3 years ago

Also!

The agent has been updated.

Please download and install and have fun! FIRST *READ the readme. As it tells you what is required for setup.

edit donation link in the Readme

Supertramp78 commented 3 years ago

Quick question. You mentioned Firefox 60. I've got the most recent version of firefox which is 80.something. WIll that work? Or is 60 a different animal? This is for Windows 10.

bubonic commented 3 years ago

Quick question. You mentioned Firefox 60. I've got the most recent version of firefox which is 80.something. WIll that work? Or is 60 a different animal? This is for Windows 10.

You need firefox 60.0. Use the link in the README to download the binary and install it in a separate location. I tested this with Firefox 80-84 and those don't work with the python packages that the Agent uses.

Make sure to edit the one line for the location of Firefox 60.0.

Feel free to try different versions of Firefox. For me 60.0 is what worked based on minimum dependency guidelines.

Supertramp78 commented 3 years ago

ok, got firefox 60. Wrote in the link on line 38. Got ImageMagick and installed it. Downloaded Geckdriver and ran it and all I get is a blank window. Looks like a command line but nothing happens. Is that it? Or am I missing something.

bubonic commented 3 years ago

I'm unclear exactly how you proceeded. Don't run geckodriver. Just install it and the Agent should find the location in your computer where it is at. Just to clarify the link on line 38 should be, for windows users the location of firefox 60 binary. i.e.,

C:\firefox60\firefox.exe

Install the agent as you did before and see if the agent appears in the "Agents" section of "TV Shows" within your PLEX interface. I apologize for having more steps to do, but that's what we are left with now that TGC website is using JS.

bubonic commented 3 years ago

Maybe geckodriver for windows doesn't have an install wizard. If that's the case just edit the line:

GECKODRIVER = "/usr/local/bin/geckodriver"

And change the part in quotes to wherever you have geckodriver.exe located. Otherwise, I don't think the agent will find it. Possibly an extra step. I could be of more help if I had a Windows box, but I don't so meh. :)

Supertramp78 commented 3 years ago

Well, for now I'm just trying to get my server to work again. After doing all this it wouldn't start The PLEX server said it was running but nothing could connect to it. Reinstalled it and no go. So now I'm restoring from a backup to at least get back to normal. LOL. I may try again tomorrow. Otherwise I may see if you are available for a teams meeting or something so I can show you what is going on.

On Fri, Jan 22, 2021 at 9:21 PM bubonic notifications@github.com wrote:

Maybe geckodriver for windows doesn't have an install wizard. If that's the case just edit the line:

GECKODRIVER = "/usr/local/bin/geckodriver"

And change the part in quotes to wherever you have geckodriver.exe located. Otherwise, I don't think the agent will find it. Possibly an extra step. I could be of more help if I had a Windows box, but my resources are limited and have been for a while.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bubonic/TGC.bundle/issues/4#issuecomment-765851484, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANVWQ4EYFETWDHJHDPGSSJ3S3I6FBANCNFSM4VIQIOLQ .

bubonic commented 3 years ago

Oh no! I hope you get everything back to normal. I've had my fair share of PLEXITIS. In fact, not due to the agent, I deleted my entire plex and started from scratch while coding this. I have a couple severs and had only about 150 of the courses loaded before I put the Kibosh on the whole thing and started with a clean install to make sure I was doing everything right.

PLEX is a nasty animal when it isn't working right. Good luck and let me know how I can help.

Supertramp78 commented 3 years ago

Acronis is is my friend. 👍

On Fri, Jan 22, 2021 at 10:39 PM bubonic notifications@github.com wrote:

Oh no! I hope you get everything back to normal. I've had my fair share of PLEXITIS. In fact, not due to the agent, I deleted my entire plex and started from scratch while coding this. I have a couple severs and had only about 150 of the courses loaded before I put the Kibosh on the whole thing and started with a clean install to make sure I was doing everything right.

PLEX is a nasty animal when it isn't working right. Good luck and let me know how I can help.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bubonic/TGC.bundle/issues/4#issuecomment-765865591, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANVWQ4EGOEMEJKRRDWZSJVLS3JHIDANCNFSM4VIQIOLQ .

Supertramp78 commented 3 years ago

FYI, it's all back to normal. Acronis really is my friend. I've got it on all my PCs. I may give this another try later.

bubonic commented 3 years ago

FYI, it's all back to normal. Acronis really is my friend. I've got it on all my PCs. I may give this another try later.

Niice. If you do try it. Re-download the agent. I made a small code change that would affect it working on Windows.

Supertramp78 commented 3 years ago

Well since my restore date was the previous day, I'm going to have to re-download EVERYTHING! ROFL!!

On Sat, Jan 23, 2021 at 1:15 PM bubonic notifications@github.com wrote:

FYI, it's all back to normal. Acronis really is my friend. I've got it on all my PCs. I may give this another try later.

Niice. If you do try it. Re-download the agent. I made a small code change that would affect it working on Windows.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bubonic/TGC.bundle/issues/4#issuecomment-766164378, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANVWQ4B4XYSX347UGWYFEW3S3MN4RANCNFSM4VIQIOLQ .

willemsjawt commented 3 years ago

Looks like it's connecting but doesn't pull the artwork. com.plexapp.agents.none.log

Hope you can do something with the log. It also shows agents.none for some reason. Windows 10Pro 64bit

Edit: I did add both paths to Firefox 60.0 and geckodriver.exe

bubonic commented 3 years ago

Looks like it's connecting but doesn't pull the artwork. com.plexapp.agents.none.log

Hope you can do something with the log. It also shows agents.none for some reason. Windows 10Pro 64bit

Edit: I did add both paths to Firefox 60.0 and geckodriver.exe

That log doesn't tell me anything and looks completely different than the logs that I get from the agent. Not sure how to proceed with that. The Agent should most certainly register as TGC.

bubonic commented 3 years ago

A possible scenario is the you copied the git repo directly to the PLEX plugin folder. You need to go one directory deeper and copy the second TGC.bunlde directory instead of the first. It was weird when I created this repo, and haven't changed it since.

sonnyvivanov commented 3 years ago

Hi @bubonic,

Requesting your assistance to get it running on Linux (Synology NAS).

I installed ImageMagick, Geckodriver, Firefox 60 (in the required folder). The agent doesn't retrieve metadata.

Log: com.plexapp.agents.tgc.log

I think I'm missing your requirement "X11 or Xvfb (headless)". I'm a Linux novice so I didn't know what any of these meant until today.

Researching in google I see people recommend I use apt-get to install Xvfb. But my Synology NAS doesn't have that package manager, nor can I install it. I have opkg, but it doesn't appear to have Xvfb. I am willing to sideload it, but can't find a link.

Can you suggest how to proceed? Regards,

bubonic commented 3 years ago

@sonnyvivanov All I know about Synology NAS is it runs a highly modified version of linux. I'm quite unsure how to install xvfb on there if it's not finding the package. Be careful if you sideload anything. But, as the log shows, the agent is trying to quit because Xvfb is not loaded and fails on one of the first lines of update()

Do some googling around on how to get Xvfb on your Synology. Hopefully you find something.

bubonic commented 3 years ago

@sonnyvivanov

This might be of some help:

https://www.sindastra.de/p/1601/what-is-synology-lxqt-a-hidden-gem/

bubonic commented 3 years ago

I think that because PLEX has a built-in python 2.6 distro, there is no way around not using Xvfb or some other virtual display with selenium. Don't quote me on that though.