As even single mailing lists within the 3GPP and IEEE archives are very large (e.g. a single mailing list such as 3GPP_TSG_GERAN_WG1 contains > 4k messages and can take ~1h to scrape), it can happen that the server connection breaks down before the crawling has ended, resulting in an error such as:
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='list.etsi.org', port=443): Max retries exceeded with url: /scripts/wa.exe?A2=ind0103&L=3GPP_TSG_GERAN_WG2&O=D&P=3211 (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f24016bea90>: Failed to establish a new connection: [Errno -2] Name or service not known'))
Such error should ideally be captured and the already retrieved message saved.
As even single mailing lists within the 3GPP and IEEE archives are very large (e.g. a single mailing list such as 3GPP_TSG_GERAN_WG1 contains > 4k messages and can take ~1h to scrape), it can happen that the server connection breaks down before the crawling has ended, resulting in an error such as:
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='list.etsi.org', port=443): Max retries exceeded with url: /scripts/wa.exe?A2=ind0103&L=3GPP_TSG_GERAN_WG2&O=D&P=3211 (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f24016bea90>: Failed to establish a new connection: [Errno -2] Name or service not known'))
Such error should ideally be captured and the already retrieved message saved.