Berea-CS-Courses / capstone-project-schweinsbergs

capstone-project-schweinsbergs created by GitHub Classroom
0 stars 0 forks source link

D5: End To End Testing: Limited # of Scrapes #29

Open schweinsbergs opened 3 years ago

schweinsbergs commented 3 years ago

Interesting bug I found: While trying to supplement my dataframe, it appears I found a bug that only allows me to scrape ~350 times per run. It doesn't crash or error, it just... stops scraping. It finishes with an exit code that some people have various issues for. Some blame memory issues, some blame pycharm, some blame certain versions of python, etc...

I'm gonna tinker with this one and I might end up shelving it-- A workaround may be refactoring some code, though. Either way, this may be a time-consuming bug.

schweinsbergs commented 3 years ago

Starting at cell 2 (cell 1 is the first bit of the sitemap, unusable) will only let you reliably scrape to cell 345. Scraping from cell 500 let me go up to cell 870. I'm leaning towards this potentially being a memory issue...

schweinsbergs commented 3 years ago

Restarted the computer upon some stackoverflow recommendation. Same exit code. Process finished with exit code -1073741571 (0xC00000FD)

schweinsbergs commented 3 years ago

The more I run the code (I was running scrapes again to try and append the url), the less scrapes it does.

alfarozavalae commented 3 years ago

Wow! seems like this one is giving you a headache! Let me check with the professors regarding what this can be or if it is even doable to try to fix it at this point.

alfarozavalae commented 3 years ago

I talked to Mario about this and he says that since you have other bugs you are working on, for now you can focus on those and if you have time work on this. You have information now that you are displaying so it is not super urgent to work on this one right?