adbar / htmldate

Fast and robust date extraction from web pages, with Python or on the command-line
https://htmldate.readthedocs.io
Apache License 2.0
117 stars 26 forks source link

Unpin lxml #114

Closed adamh-oai closed 6 months ago

adamh-oai commented 7 months ago

Per https://github.com/adbar/trafilatura/issues/449, see if unpinning lxml works on MacOS.

adbar commented 7 months ago

Hi @adamh-oai, the tests pass, the current issue is unrelated, a dummy site is down.

codecov[bot] commented 7 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (4088710) 99.07% compared to head (b8cc607) 99.07%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #114 +/- ## ======================================= Coverage 99.07% 99.07% ======================================= Files 8 8 Lines 867 867 ======================================= Hits 859 859 Misses 8 8 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

adbar commented 7 months ago

Now it works for Python 3.12 but not for Python 3.8 on MacOS.

mde-pach commented 6 months ago

I ran the tests locally and everything works well with the same version of python on macos (m2 architecture) Can this test not works because of the content of the website during the run ? Did you try to rerun them ?

Edit: I just saw the tests are run using a x86 architecture, maybe the difference come from there

adbar commented 6 months ago

Yes, I updated the tests. I'm not sure why tests fail specifically on Python 3.8 but you're probably right in assuming different architectures lead to different outcomes.

We could try adding other versions of Python on MacOS, so far other architectures are not easily available on Github Actions as far as I know.

adbar commented 6 months ago

@adamh-oai I believe I found the right combination of settings and tests, what do you think?

adamh-oai commented 6 months ago

lgtm!

adamh-oai commented 6 months ago

Could you put up a new release on pypi with this change? Thanks!

adbar commented 6 months ago

Yes, I plan to do it this week for htmldate and a bit later for trafilatura.