Aghababaei / PhD-Seeker

Finding latest fully funded PhD positions for international students through web scraping
GNU General Public License v3.0
84 stars 11 forks source link

date issue with findaphd #9

Open javadr opened 2 years ago

javadr commented 2 years ago

Dear Amin,

Did you find any solution to catch the date/time from findapphd website?

I tested the requests_html but it doesn't work too, even with calling the render method!

Could you test the other ways to solve this issue?

Best, Javad

Aghababaei commented 2 years ago

Dear Javad,

Since our last discussion, I have tried multiple methods, including the requests_html method. However, it failed to assist us. Still, I think the solution lies in working merely with the re moduli.

Thank you for reminding me and creating an issue section. In order to address the issue, I will retest potential solutions. As soon as I reach a satisfactory conclusion, I will update this section.

Best, Amin

javadr commented 2 years ago

Actually, it does not have any relation to the re module. When the code fetches the page, the date tag is empty and it seems to be loaded via something like the ajax mechanism. I mean the content of the page is loaded without any date data and then the date fields will be loaded via another process. Consequently, when we get the page there is no date on it.

request_html has a render function which is useful in this manner and I had some good experiences in another case but here it does not work properly.

tldr: the date tag is loaded with an empty string.