JusteRaimbault / BiblioData

0 stars 1 forks source link

Running small network (dozen papers) through Windows #7

Open Bonnie-Buyuklieva opened 2 years ago

Bonnie-Buyuklieva commented 2 years ago

Hi Juste. Opening an issue on the feature mentioned: implementing wating times between requests

Processor Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz 1.99 GHz Installed RAM 16.0 GB (15.8 GB usable) Edition: Windows 11 Home

Let me know where I can help. Thank you!

JusteRaimbault commented 2 years ago

Hi Bonnie! Thanks for opening the issue - better to track and link with code! A precision regarding network size: does your seed corpus have a dozen papers, or the final intended network? What is the approach to build it - do you have all the papers already and need to retrieve only citations links between them, or do you also need to retrieve more citing papers?

Bonnie-Buyuklieva commented 2 years ago

Thank you for picking this up so swiftly! Sorry if I didn't pose the question right.

The seed papers are a dozen, and it would work for fewer too for my purposes. The general problem is: given these X papers, how many documents does it take to link each one to another? (For example, in the migration network, it took almost 10 papers (average) to link any document to another. Here I would be looking for the shortest path length). I think this would need more citing papers.

For reference the key documents with google scholar links are:

Do you think it is feasible?

Bonnie


From: Juste Raimbault @.> Sent: 19 January 2022 08:21 To: JusteRaimbault/BiblioData @.> Cc: Buyuklieva, Boyana @.>; Author @.> Subject: Re: [JusteRaimbault/BiblioData] Running small network (dozen papers) through Windows (Issue #7)

⚠ Caution: External sender

Hi Bonnie! Thanks for opening the issue - better to track and link with code! A precision regarding network size: does your seed corpus have a dozen papers, or the final intended network? What is the approach to build it - do you have all the papers already and need to retrieve only citations links between them, or do you also need to retrieve more citing papers?

— Reply to this email directly, view it on GitHubhttps://github.com/JusteRaimbault/BiblioData/issues/7#issuecomment-1016191202, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEXQNYDR5GJZ6IEIREU5N2DUWZYBHANCNFSM5MIFZZZA. Triage notifications on the go with GitHub Mobile for iOShttps://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fapps.apple.com%2Fapp%2Fapple-store%2Fid1477376905%3Fct%3Dnotification-email%26mt%3D8%26pt%3D524675&data=04%7C01%7Cboyana.buyuklieva%40ucl.ac.uk%7C522d870e71b34510a76308d9db24b88a%7C1faf88fea9984c5b93c9210a11d9a5c2%7C0%7C0%7C637781773022889359%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=twvAzaNYFzjxYb9%2B%2Buwa8Lbmk7BLtypBXGlK183U3z8%3D&reserved=0 or Androidhttps://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fplay.google.com%2Fstore%2Fapps%2Fdetails%3Fid%3Dcom.github.android%26referrer%3Dutm_campaign%253Dnotification-email%2526utm_medium%253Demail%2526utm_source%253Dgithub&data=04%7C01%7Cboyana.buyuklieva%40ucl.ac.uk%7C522d870e71b34510a76308d9db24b88a%7C1faf88fea9984c5b93c9210a11d9a5c2%7C0%7C0%7C637781773022889359%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=x8ZzeuYcCx6hPFZnZjWW0%2FdxO97v9nJFBnGiFTX1q8I%3D&reserved=0. You are receiving this because you authored the thread.Message ID: @.***>

JusteRaimbault commented 2 years ago

Thanks for the details! As I see it now, obtaining the shortest path (or the best proxy of it we can) already implies a quite large network - the first papr is cited 1599 times, so with the same order of magnitude, we would have a first layer of ~10000 papers, and second layer difficult to say but it can go up to 100,000 - and these two layers are necessary to obtain some connectivity. We could just collect the first layer in "waiting time" mode on a local computer so it is faster, but I am not sure the result will be exploitable for what you want to do. Anyway I will try to implement the feature soon and document it better, also with a better documentation of the broader use - will update you soon here!

Bonnie-Buyuklieva commented 2 years ago

I did fear this might be the case, but it is good to know for sure! Thanks.

As for documentation or implementation: happy to help where I can. Mid Feb onwards especially


From: Juste Raimbault @.> Sent: 24 January 2022 17:30 To: JusteRaimbault/BiblioData @.> Cc: Buyuklieva, Boyana @.>; Author @.> Subject: Re: [JusteRaimbault/BiblioData] Running small network (dozen papers) through Windows (Issue #7)

⚠ Caution: External sender

Thanks for the details! As I see it now, obtaining the shortest path (or the best proxy of it we can) already implies a quite large network - the first papr is cited 1599 times, so with the same order of magnitude, we would have a first layer of ~10000 papers, and second layer difficult to say but it can go up to 100,000 - and these two layers are necessary to obtain some connectivity. We could just collect the first layer in "waiting time" mode on a local computer so it is faster, but I am not sure the result will be exploitable for what you want to do. Anyway I will try to implement the feature soon and document it better, also with a better documentation of the broader use - will update you soon here!

— Reply to this email directly, view it on GitHubhttps://github.com/JusteRaimbault/BiblioData/issues/7#issuecomment-1020356453, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEXQNYBWSXEQMJ6YHHN3QSTUXWECBANCNFSM5MIFZZZA. Triage notifications on the go with GitHub Mobile for iOShttps://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fapps.apple.com%2Fapp%2Fapple-store%2Fid1477376905%3Fct%3Dnotification-email%26mt%3D8%26pt%3D524675&data=04%7C01%7Cboyana.buyuklieva%40ucl.ac.uk%7C5e1a70c34c7549d16fca08d9df5f2b65%7C1faf88fea9984c5b93c9210a11d9a5c2%7C0%7C0%7C637786422105095799%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=sK5Hw2j3EZn%2BA6heVaUYo7PA0NG5xPe%2BlbmcVza9Zvg%3D&reserved=0 or Androidhttps://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fplay.google.com%2Fstore%2Fapps%2Fdetails%3Fid%3Dcom.github.android%26referrer%3Dutm_campaign%253Dnotification-email%2526utm_medium%253Demail%2526utm_source%253Dgithub&data=04%7C01%7Cboyana.buyuklieva%40ucl.ac.uk%7C5e1a70c34c7549d16fca08d9df5f2b65%7C1faf88fea9984c5b93c9210a11d9a5c2%7C0%7C0%7C637786422105095799%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=PbBSwjpwyPYhr3skaHNuk43Fql8hsdrCcc8Y432%2FyA0%3D&reserved=0. You are receiving this because you authored the thread.Message ID: @.***>