cloudy-astrophysics / cloudy-users-group

A friendly forum for discussions on Cloudy
0 stars 0 forks source link

size and bandwidth limits on GitHub #4

Open CloudyLex opened 4 years ago

CloudyLex commented 4 years ago

The good news is that Marios and I got the entire head of the trunk, including "BigBoy", the 0.5 GB Rydberg state radiative data, onto GitHub. That repo is https://github.com/cloudy-astrophysics/cloudy_lfs Marios has an older Mac that cannot install lfs but was able to download the entire trunk, including BigBoy. So that all works.

I am concerned about the file and bandwidth limits on the academic GitHub license I have. The nublado.org log shows that our monthly download varies between 18 - 40 GB. It looks like the GitHub limit is 1 GB?

ogoann commented 4 years ago

Hi Gary,

The cost of additional bandwidth/space on github lfs is 5$/50GB/month. How does that compare to the current storage costs?

However, I am still concerned that github lfs is not the best way to go, because it requires the user to install the github extension as well (see my other question about how do we expect users to install Cloudy in the future). I just downloaded cloudy_lfs repository, and the "big boy" is not there - there is only the small refernce file of github lfs extenstion.

version https://git-lfs.github.com/spec/v1
oid sha256:7f017707990b80ee7a9bc7f74c3da5bbe0769a0fe693a696e000ea10a30c277e
size 466744290

Could the solution be to host the data base files (these change very little) on some FTP we have access to (the current one?), and have it be a part of the makefile/installation process to have these data base files directly downloaded from said FTP?

Best, Anna

CloudyLex commented 4 years ago

Hi Anna,

I am even more concerned about the bandwidth. The current trunk is 1.,2 GB with Big Boy and 0.73 GB without. The c17 checkout is 0.5 GB. The bandwidth stats I quoted were probably mainly c17 exports (although anybody could get our trunk or the whole repository). The citation rate for Cloudy's documentation is going up about 20% per year. I have not checked how the bandwidth is changing, but expect it is rising due to an increasing number of users and will rise due to the size of the checkout.

Big Boy was a design error. It allows for very high precision radiative data up in the Rydberg levels. As Robin pointed out, those data are known from asymptotic limits. We do not need precision up there since populations of high levels are dominated by collisions even for modest densities. The collision rates are mainly from Born approximation formulae (the so-called g-bar approximation) and are highly uncertain. Without Big Boy, allowing for some growth in the trunk over the next year, and a reasonable rate of usage increase, we will soon be looking at ~100 GB/month.

The current nublado.org is hosted by Webfaction which was purchased by GoDaddy within the last year. Their future is unclear due to negative vibes about GoDaddy, but nothing has changed yet. We now pay Webfaction $10/month which allows for 1 TB / month. The problems are that we have to maintain it and my university is uncomfortable using grants to pay for external web sites (they are terrified of an OMB audit which would affect the NIG grants over in the medical school and university hospital as collateral damage). Similarly, the university has its computers behind a firewall which requires access through a VPN. They have a firm policy of discouraging computers being openly exposed to the internet. We can't host anything here - I have checked. (They did let me keep the Cloudy summer school site on cloud9 but said that was the limit of what they would allow).

Like most universities, we have a relationship with Google and unlimited access to a university-related but Google-hosted Google drive. Tarballs could be placed up there and made public. China could not access that, due to China not Google, but most Chiese universities have ways to get to the open web. But if development were out in the open on GitHub then anybody could clone whatever we have up there so the bandwidth hit on GitHub might still be over the limit.

It is curious that there is no simple way to host a broadly used community project like Cloudy. Seems like NASA or NSF would have an interest in helping out directly. They do help with grants but that brings in our accountants.

thanks for any further ideas, Gary

On Wed, Nov 6, 2019 at 1:38 PM Anna Ogorzalek notifications@github.com wrote:

Hi Gary,

The cost of additional bandwidth/space on github lfs is 5$/50GB/month. How does that compare to the current storage costs?

However, I am still concerned that github lfs is not the best way to go, because it requires the user to install the github extension as well (see my other question about how do we expect users to install Cloudy in the future). I just downloaded cloudy_lfs repository, and the "big boy" is not there - there is only the small refernce file of github lfs extenstion.

version https://git-lfs.github.com/spec/v1 oid sha256:7f017707990b80ee7a9bc7f74c3da5bbe0769a0fe693a696e000ea10a30c277e size 466744290

Could the solution be to host the data base files (these change very little) on some FTP we have access to (the current one?), and have it be a part of the makefile/installation process to have these data base files directly downloaded from said FTP?

Best, Anna

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/cloudy-astrophysics/cloudy-users-group/issues/4?email_source=notifications&email_token=ANFITNCQYB7JABPXZJQMVLDQSMFL3A5CNFSM4JI3KGG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDHRZKA#issuecomment-550444200, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANFITNC32JBYZUZIPMBAGWLQSMFL3ANCNFSM4JI3KGGQ .

-- Gary J. Ferland Physics, Univ of Kentucky Lexington KY 40506 USA Tel: 859 257-8795 https://pa.as.uky.edu/users/gary

ogoann commented 4 years ago

Hi Gary,

I am still thinking about this. I cannot find any information whether regular github repository has any bandwidth limits, in which case if you keep all files <100 MB (and then combine them with makefile or something), things should be fine?

Best, Anna

ogoann commented 4 years ago

Gary,

I have no time right now to look in detail into this, but I wonder if https://zenodo.org/ could be the answer, as it is a place where we could store version of Cloudy that users should download/and or just the big files. There seems ot be a 50 GB limit for the size (but exceptions can be granted, and with Cloud's citation count this should be no problem, but we don't even need this currently). There is no download limit for users. One way that I see this working is that this is where data bases are stored, as well as tarred files for installation, and github is used for code development, sharing user examples and Cloudy papers, and user questions.

https://about.zenodo.org/policies/ https://about.zenodo.org/terms/ -a

Morisset commented 4 years ago

It should be quite easy to test any limit in bandwidth. How much time do we need to download the package to reach it? The zip file on github is 185M. The git clone leads to a package of 910M. If I tar.gz this package, I obtain a 364M file (182 M are in the .git directory...). Anyway, downloading the package costs close to 200M. The mean download is 20G/month said Gary, i.e. close to 100 downloads. Can each of us install 20 times cloudy during the next 24 h and we will see if any limitation is reached? Christophe

Le jeu. 7 nov. 2019 à 09:14, Anna Ogorzalek notifications@github.com a écrit :

Hi Gary,

I am still thinking about this. I cannot find any information whether regular github repository has any bandwidth limits, in which case if you keep all files <100 MB (and then combine them with makefile or something), things should be fine?

Best, Anna

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/cloudy-astrophysics/cloudy-users-group/issues/4?email_source=notifications&email_token=AADMER6ALPAEZ5LWULHCFHLQSREFZA5CNFSM4JI3KGG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDNELXY#issuecomment-551175647, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADMER5TP45OCKWXOEZJNCLQSREFZANCNFSM4JI3KGGQ .

-- Dr. Christophe MORISSET Tel: +52 646 174 45 80 ext 230 Instituto de Astronomia UNAM Apdo. Postal 106, C.P. 22800 Ensenada Baja California MEXICO

CloudyLex commented 4 years ago

Cern funds https://zenodo.org/

In past encounters with them, they have been free and open with Euros but had different rules for this side of the pond. They might be ideal if they will let us in.

thanks, Anna! Gary

On Thu, Nov 7, 2019 at 1:04 PM Anna Ogorzalek notifications@github.com wrote:

Gary,

I have no time right now to look in detail into this, but I wonder if https://zenodo.org/ is an answer, as a place where we could store version of Cloudy that users should downloaded. There seems ot be a 50 GB limit for the size (but exceptions can be granted, and with Cloud's citation count this should be no problem, but we don't even need this currently). There is no download limit for users. One way that I see this working is that this is where data bases are stored, as well as tarred files for installation, and github is used for code development, sharing user examples and Cloudy papers, and user questions.

-a

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/cloudy-astrophysics/cloudy-users-group/issues/4?email_source=notifications&email_token=ANFITNGOP5OQ4G6HJ4BVAT3QSRKC5A5CNFSM4JI3KGG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDNJJCQ#issuecomment-551195786, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANFITNAWRQ4KR6VW24ZOUDDQSRKC5ANCNFSM4JI3KGGQ .

-- Gary J. Ferland Physics, Univ of Kentucky Lexington KY 40506 USA Tel: 859 257-8795 https://pa.as.uky.edu/users/gary

ogoann commented 4 years ago

I think this is supposed to be extremely open, and I am not aware of any geographical restrictions. I also believe they are committed to make sure that this is a stable place to store scientific data. Btw, they also have "Communities" feature that could be useful for Cloudy workshops? Haven't really checked it out thoroughly though.

CloudyLex commented 4 years ago

my interactions with Cern where over software they developed and may have been well more than a decade ago. I agree that the landscape is different (better) now.

On Thu, Nov 7, 2019 at 1:56 PM Anna Ogorzalek notifications@github.com wrote:

I think this is supposed to be extremely open, and I am not aware of any geographical restrictions. I also believe they are committed to make sure that this is a stable place to store scientific data. Btw, they also have "Communities" feature that could be useful for Cloudy workshops? Haven't really checked it out thoroughly though.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/cloudy-astrophysics/cloudy-users-group/issues/4?email_source=notifications&email_token=ANFITNGJD5T6KFMQGFFLPMDQSRQHNA5CNFSM4JI3KGG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDNOEEY#issuecomment-551215635, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANFITNECS3EX3Q5N4ZUWROTQSRQHNANCNFSM4JI3KGGQ .

-- Gary J. Ferland Physics, Univ of Kentucky Lexington KY 40506 USA Tel: 859 257-8795 https://pa.as.uky.edu/users/gary

ogoann commented 4 years ago

ADS lists more than 2k citations to zenodo record just this year !

CloudyLex commented 4 years ago

it would be interesting to hear what Robin things about this - he would be far more tuned into what is going on over there.

On Thu, Nov 7, 2019 at 2:03 PM Anna Ogorzalek notifications@github.com wrote:

ADS lists https://ui.adsabs.harvard.edu/search/p_=0&q=%20full%3A%22zenodo%22&sort=date%20desc%2C%20bibcode%20desc more than 2k citations to zenodo record just this year !

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/cloudy-astrophysics/cloudy-users-group/issues/4?email_source=notifications&email_token=ANFITNBSZPWQ5SLL3CBM46LQSRQ6JA5CNFSM4JI3KGG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDNOXTY#issuecomment-551218127, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANFITNHDJKSN4ITFNHBO6FTQSRQ6JANCNFSM4JI3KGGQ .

-- Gary J. Ferland Physics, Univ of Kentucky Lexington KY 40506 USA Tel: 859 257-8795 https://pa.as.uky.edu/users/gary

CloudyLex commented 4 years ago

following through on the ADS links - i got to mostly broken links to arxiv and one where they linked to a copy of the paper on zenodo. Please use it for publication archives?

On Thu, Nov 7, 2019 at 2:03 PM Anna Ogorzalek notifications@github.com wrote:

ADS lists https://ui.adsabs.harvard.edu/search/p_=0&q=%20full%3A%22zenodo%22&sort=date%20desc%2C%20bibcode%20desc more than 2k citations to zenodo record just this year !

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/cloudy-astrophysics/cloudy-users-group/issues/4?email_source=notifications&email_token=ANFITNBSZPWQ5SLL3CBM46LQSRQ6JA5CNFSM4JI3KGG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDNOXTY#issuecomment-551218127, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANFITNHDJKSN4ITFNHBO6FTQSRQ6JANCNFSM4JI3KGGQ .

-- Gary J. Ferland Physics, Univ of Kentucky Lexington KY 40506 USA Tel: 859 257-8795 https://pa.as.uky.edu/users/gary

ogoann commented 4 years ago

Hi Gary, unsure what you mean. The point I was trying to make is that zenodo seems to be more and more popular in astro (just looking at the statistics). Specific papers either cite other people's codes that they used, or link to their own. I believe that arxiv is down right now, hence the broken links.

-a

Morisset commented 4 years ago

There is ways to connect a github repository with a zenodo account. I used it to store the Pyneb and pyCloudy packages. It automatically creates new zenodo version when a new tag is made on github. It allows to have a DOI for the code, and for each version.

Example for pyneb, which I wrongly named Pyneb_devel, while a devel branch was only necessary for what I wanted to do: https://doi.org/10.5281/zenodo.1246922 Ch.

Le jeu. 7 nov. 2019 à 12:07, Anna Ogorzalek notifications@github.com a écrit :

Hi Gary, unsure what you mean. The point I was trying to make is that zenodo seems to be more and more popular in astro (just looking at the statistics). Specific papers either cite other people's codes that they used, or link to their own. I believe that arxiv is down right now, hence the broken links.

-a

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/cloudy-astrophysics/cloudy-users-group/issues/4?email_source=notifications&email_token=AADMER55C42JQ3UMKXEP2OTQSRYRZA5CNFSM4JI3KGG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDNUWZQ#issuecomment-551242598, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADMER3JZPS2SVUBTGHTTADQSRYRZANCNFSM4JI3KGGQ .

-- Dr. Christophe MORISSET Tel: +52 646 174 45 80 ext 230 Instituto de Astronomia UNAM Apdo. Postal 106, C.P. 22800 Ensenada Baja California MEXICO

will-henney commented 4 years ago

A couple of quick responses to this. I tend to agree with Anna that LFS is not the way to go. Here is a comment of mine from the private email thread of last month:

Thanks for pointing that out - I had forgotten that conversation from 18 months ago. Are you sure that LFS is needed though. According to the following page, the hard limits are 100MB per file and 100GB (!!!) per repository.

https://help.github.com/en/github/managing-large-files/what-is-my-disk-quota

They do mention a softer limit of 1GB, where they send you a polite email, but hopefully that is negotiable. I just looked at my copy and the .git directory is 1.1G and the only seriously large file is data/hydro_tpnl.dat at 448M - I can't find anything else bigger than 100MB

If it were possible to keep under the 100MB-per-file limit, then I think we would not have to pay anything on github, however large the total bandwidth use is.

I would also second the idea of using Zenodo or similar to host the released versions of Cloudy. That would mean that it would only be developers and power-users who would be directly downloading from Github. The zenodo size limit is 50 GB per record, which gives plenty of room for growth (a record would be, for instance, a single released version).

CloudyLex commented 4 years ago

Hi Will, Thanks for the comments - intense days with setting up the new group and figuring out how to sunset the old one.

a single checkout of the trunk is over 1 GB - I don't have the numbers right now. The bandwidth on GitHub is so small that we could not do two checkouts in a month. I don't see this working - do you? You know far more than I do.

The "big boy", the huge file that is a good fraction of the checkout, was a design error and will be removed. That will help.

any suggestions on how to move the user community from yahoo to groups.io? Everyone must have received the email when the transfer happened. Most people are busy and will not pay attention. Gary

On Thu, Nov 14, 2019 at 1:42 PM William Henney notifications@github.com wrote:

A couple of quick responses to this. I tend to agree with Anna that LFS is not the way to go. Here is a comment of mine from the private email thread of last month:

Thanks for pointing that out - I had forgotten that conversation from 18 months ago. Are you sure that LFS is needed though. According to the following page, the hard limits are 100MB per file and 100GB (!!!) per repository.

https://help.github.com/en/github/managing-large-files/what-is-my-disk-quota

They do mention a softer limit of 1GB, where they send you a polite email, but hopefully that is negotiable. I just looked at my copy and the .git directory is 1.1G and the only seriously large file is data/hydro_tpnl.dat at 448M - I can't find anything else bigger than 100MB

If it were possible to keep under the 100MB-per-file limit, then I think we would not have to pay anything on github, however large the total bandwidth use is.

I would also second the idea of using Zenodo or similar to host the released versions of Cloudy. That would mean that it would only be developers and power-users who would be directly downloading from Github. The zenodo size limit is 50 GB per record, which gives plenty of room for growth (a record would be, for instance, a single released version).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/cloudy-astrophysics/cloudy-users-group/issues/4?email_source=notifications&email_token=ANFITNF53VHGNAFPUJKORPTQTWL2RA5CNFSM4JI3KGG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEC3VOA#issuecomment-554023608, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANFITNBZO2XWVEWYPZNCTXTQTWL2RANCNFSM4JI3KGGQ .

-- Gary J. Ferland Physics, Univ of Kentucky Lexington KY 40506 USA Tel: 859 257-8795 https://pa.as.uky.edu/users/gary

will-henney commented 4 years ago

Hi Gary,

But the bandwidth limits only apply if using LFS. There are no bandwidth limits on regular repos are there? Just the restriction of no individual files larger than 100MB.

I will open a separate thread on your yahoo query

ogoann commented 4 years ago

I second Will. If we split all files to 100 MB chunks and have github be used only for development, while zenodo is used for downloading the code by users, then this is a long term sustainable solution.

I also want to remind us that github offers free web hosting, meaning that the webpage (which could of course link to groups.io and zenodo) and wiki all can be in one place. This makes it much easier to maintain the webpage in the future, since it's code would also be on github and people can submit issues/requests/bugs and generally help out with maintaining it.