Open petermr opened 8 years ago
This looks like it may be an OS issue (OSX) not a Node.JS one. We might have success increasing ulimit.
This is a general unix feature, where the operating system prevents runaway programs from destroying the machine. We definitely don't want to raise the ulimit
, but we need to figure out why so many file handles are being kept open and make sure we keep the number open concurrently down to a reasonable limit.
I think the reason is tied to #58. I think at the moment the code spawns as many attempted downloads as there are links without checking how many concurrent operations are happening. I think to reason we don't usually see it is people don't actually usually finish searching queries with over 1024 papers
I actually just successfully downloaded 2539 results without hitting this problem (or a timeout). This may need further investigation.
Has there been any progress on dealing with this issue, for example by limiting how many concurrent operations can take place? I'm trying to download a large number of papers (many tens of thousands) and I keep getting timeouts and crashing.
Yes, this is fixed by #87. However it is still waiting for review. It is basically the same issue as #58 except we are also saturating the number of local file handles as well as the network connection. If you want you could always checkout #87 and test if it solves your problem?
also @robintw you mentioned you had possibly found a workaround to this bug in contentmine/getpapers#74 is it different to what happens in #87 ?