This PR addresses #419 and includes the following:
includes necessary changes that makes it possible to both scrape 3GPP and IEEE Listserv 16.5 archives
solve TODO that was left in ListservArchive.get_sections()
change tests/webscraping/test_listserv.py to use IEEE archive as it is smaller than 3GPP, and thus faster to test
include check wether mailing list is public when using only_mlist_urls=True for ListservArchive (noticed only since I checked IEEE that some are not public)
include complete list of public IEEE mailing lists in examples/url_collections/listserv.IEEE.txt
update examples/url_collections/listserv.3GPP.txt according to above changes
Note that the IEEE url list .txt files contains only very few urls as most of them are not public. Even if one creates an account for the IEEE archive, the messages that are accessible never revel addresses in the header. Thus for analysing IEEE archives, ConversationKG becomes important, as it can help to study the signatures of messages.
This PR addresses #419 and includes the following:
ListservArchive.get_sections()
tests/webscraping/test_listserv.py
to use IEEE archive as it is smaller than 3GPP, and thus faster to testusing only_mlist_urls=True
forListservArchive
(noticed only since I checked IEEE that some are not public)examples/url_collections/listserv.IEEE.txt
examples/url_collections/listserv.3GPP.txt
according to above changesNote that the IEEE url list .txt files contains only very few urls as most of them are not public. Even if one creates an account for the IEEE archive, the messages that are accessible never revel addresses in the header. Thus for analysing IEEE archives, ConversationKG becomes important, as it can help to study the signatures of messages.