datactive / bigbang

Scientific analysis of collaborative communities
http://datactive.github.io/bigbang/
MIT License
152 stars 52 forks source link

Trouble running pytest tests/webscraping/test_listserv.py` #563

Closed agt24 closed 2 years ago

agt24 commented 2 years ago

Hi there, I just discovered bigbang while searching for a listserv web archive scraper. It looks like exactly what I'm after, but I've not been able to get it working yet. I downloaded the 0.3.0 release and tried running:

pytest tests/webscraping/test_listserv.py

but most of the tests failed. (See below, note that I did create an account at the IEEE site and put my credentials in config/authentication.yaml)

If someone could share or point me toward a working listserv collection example, I could try to generalize it for my use case (list.nih.gov)

(bigbang0.3.0) ➜bigbang-0.3.0 #pytest  tests/webscraping/test_listserv.py
================================================================ test session starts =================================================================
platform darwin -- Python 3.7.7, pytest-7.1.1, pluggy-1.0.0
rootdir: /Users/adamt/proj/mailinglist_scrape/bigbang-0.3.0
collected 16 items

tests/webscraping/test_listserv.py FEF.EEFFFFFEEEE.                                                                                            [100%]

<snip>

============================================================== short test summary info ===============================================================
FAILED tests/webscraping/test_listserv.py::TestListservMessage::test__from_url_with_login - AttributeError: 'NoneType' object has no attribute 'par...
FAILED tests/webscraping/test_listserv.py::TestListservMessage::test__only_header_from_url - AttributeError: 'NoneType' object has no attribute 'pa...
FAILED tests/webscraping/test_listserv.py::TestListservList::test__from_url_with_login - IndexError: list index out of range
FAILED tests/webscraping/test_listserv.py::TestListservList::test__mailinglist_content - assert 0 == 1
FAILED tests/webscraping/test_listserv.py::TestListservList::test__to_dict - IndexError: list index out of range
FAILED tests/webscraping/test_listserv.py::TestListservList::test__to_pandas_dataframe - IndexError: list index out of range
FAILED tests/webscraping/test_listserv.py::TestListservList::test__to_mbox - FileNotFoundError: [Errno 2] No such file or directory: '/var/folders/...
ERROR tests/webscraping/test_listserv.py::TestListservMessage::test__message_content - AttributeError: 'NoneType' object has no attribute 'parent'
ERROR tests/webscraping/test_listserv.py::TestListservMessage::test__to_dict - AttributeError: 'NoneType' object has no attribute 'parent'
ERROR tests/webscraping/test_listserv.py::TestListservMessage::test__to_mbox - AttributeError: 'NoneType' object has no attribute 'parent'
ERROR tests/webscraping/test_listserv.py::TestListservArchive::test__archive_content - IndexError: list index out of range
ERROR tests/webscraping/test_listserv.py::TestListservArchive::test__to_dict - IndexError: list index out of range
ERROR tests/webscraping/test_listserv.py::TestListservArchive::test__to_pandas_dataframe - IndexError: list index out of range
ERROR tests/webscraping/test_listserv.py::TestListservArchive::test__to_mbox - IndexError: list index out of range
================================================= 7 failed, 2 passed, 3 warnings, 7 errors in 34.45s =================================================
(bigbang0.3.0) ➜bigbang-0.3.0 #
Christovis commented 2 years ago

Hi @agt24 thanks for reaching out. Could you try it again by using:

Different to other software, it is important to always stay up-to-date with BigBang, as BigBang has to stay up-to-date with the changes that takes place on the web. The reason that the IEEE test fail for your version of BigBang is, because the mailing list that was being used to test the scraping does not exist anymore.

Christovis commented 2 years ago

If there are no further problems, the issue will be closed in the next few days.