filecoin-project / filecoin-plus-large-datasets

Hub for client applications for DataCap at a large scale
110 stars 62 forks source link

[DataCap Application] SUT #492

Closed emf-developer closed 1 year ago

emf-developer commented 2 years ago

Large Dataset Notary Application

To apply for DataCap to onboard your dataset to Filecoin, please fill out the following.

Core Information

Please respond to the questions below by replacing the text saying "Please answer here". Include as much detail as you can in your answer.

Project details

Share a brief history of your project and organization.

The SUT was founded in 1966, providing both undergraduate and graduate programs across 14 main departments in science and engineering and also has established several research centers. These centers, while maintaining their own separate identities, co-exist within the university system.  This arrangement gives individual researchers the opportunity and flexibility to conduct research while establishing a working relationship between the university and industry.
In SUT, there are about 480 faculty members, 550 staff and +10000 students.
STU has many international activities and is known globally around the world for its International Affairs (Iraq, Pakistan, Russia, Japan, Bolivian, Netherlands, India, ...).
SUT is decided to store its public scientific data in a decentralized manner in web3 storage platforms to increase accessibility and reliability of information that users can achieve data globally.

What is the primary source of funding for this project?

The SUT.

What other projects/ecosystem stakeholders is this project associated with?

None.

Use-case details

Describe the data being stored onto Filecoin

All files are students, researchers and faculties public data of SUT including theses & dissertations, lectures, books, articles and papers, periodicals, open data, open projects,... .

Where was the data in this dataset sourced from?

They are all public data of SUT's students, researchers and faculties.

Can you share a sample of the data? A link to a file, an image, a table, etc., are good ways to do this.

https://drive.google.com/file/d/1y8XkeBTLYYRLZ5vfUoiS0gVi7xo4h2G9/view?usp=sharing

Confirm that this is a public dataset that can be retrieved by anyone on the Network (i.e., no specific permissions or access rights are required to view the data).

Yes. All data is open to the public.

What is the expected retrieval frequency for this data?

It depends on users' needs.

For how long do you plan to keep this dataset stored on Filecoin?

We tend to keep them as long as possible, as a permanent mirror.

DataCap allocation plan

In which geographies (countries, regions) do you plan on making storage deals?

Few countries (about 6-10) from all regions including Asia/Middle East.

How will you be distributing your data to storage providers? Is there an offline data transfer process?

The data can be transferred both offline and online. It depends on SPs' location and online data transfer rate. Online is preferred.

How do you plan on choosing the storage providers with whom you will be making deals? This should include a plan to ensure the data is retrievable in the future both by you and others.

The stability and reachability of data is important for us. There are few important parameters to choose SPs: Their location, reputation, transmission speed rate and also special features which matter(e.g. fast retrieval).

How will you be distributing deals across storage providers?

It depends on SPs' capacity, deals experiences and the status of data transmission. We make sure that no more than 25% of whole deals is given to every SP.

Do you have the resources/funding to start making deals as soon as you receive DataCap? What support from the community would help you onboard onto Filecoin?

Yes.
large-datacap-requests[bot] commented 2 years ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

Sunnyiscoming commented 2 years ago

@emf-developer Hey. Could you send an email to filplus@fil.org with your official domain in order to confirm your identity?

emf-developer commented 2 years ago

@Sunnyiscoming The confirmation email sent, please check.

raghavrmadya commented 2 years ago

@emf-developer What is your relationship with SUT?

raghavrmadya commented 2 years ago

Additionally, I would like more clarity on your DataCap allocation plan. We require geographic diversity and while you mention that you will store in about 6-10 regions including Asia/Middle East, we need to have a more concrete plan. Are there SPs already waiting for DataCap on your end? Please provide as much information as possible on SP distribution and how you plan to move the data to SPs.

Sunnyiscoming commented 2 years ago

@emf-developer Any update here?

emf-developer commented 2 years ago

sorry, didn't check for a while. I'm Erfan Moh Far, R&D head of this project's team in SUT.

We plan to store our data for at least 1-2 years in various regions and shares especially in Asia/Middle East and at the end of deals time, they may extend: 4-5 SPs from Asia/Middle East (~50% of data) 2-3 SPs from Europe (~30% of data) 2-3 SPs from US/Canada (~20% of data)

We've never done this before and SPs will be choose by their reputation and Deal Success Predictors' info.

Data will be mainly transmitted online especially to those who are far away, but we can also transfer offline. After completing our list of desired SPs, we will check if it is possible to transfer offline or not (if SP accepts offline transfer and is not far away). Online transfer is preferred. Currently, we've chosen a few of SPs and we are seeking to finish the list. final list may vary.

Our allocation plan is based on trust over time. We will first make one deal with each miner to check the process and result and after that, the rest of deals will be made.

@raghavrmadya @Sunnyiscoming

Sunnyiscoming commented 2 years ago

Can you share the list now?

emf-developer commented 2 years ago

Yes.The list below is our current choice but as I said, the result of first deal with each will cause the rest of deals. f01392893 f01345523 f022352 f03488 f01606675 f01479781 f01421708 f01264125 f01641612 @Sunnyiscoming

raghavrmadya commented 2 years ago

Datacap Request Trigger

Total DataCap requested

500TiB

Expected weekly DataCap usage rate

15TiB

Client address

f1lakkooooyscwt5crl3b5nj3szcnh5g2mcz3sxsa

raghavrmadya commented 2 years ago

Thanks @emf-developer. If you need support with finding SPs, you can check out Bigdataexchange and Filgram. You can also find support in the #fil-plus channel on filecoin slack

emf-developer commented 2 years ago

@raghavrmadya I did a mistake and thought that sending DC to an address will initialize that address. our said client address didn't initialize before sending DC and unfortunately I didn't receive DC. I did sent FIL to the address just now and it is OK now. Can you please send it again? Thank you for your support.

raghavrmadya commented 2 years ago

Hi @emf-developer, I'm unsure of your question. I cannot send DataCap. Two notaries must propose and approve the request to get your first allocation. Please refer to the guidelines here - https://github.com/filecoin-project/filecoin-plus-large-datasets

raghavrmadya commented 2 years ago

If your concern is different, please elaborate

emf-developer commented 2 years ago

Hi, Yes I know about the allocation process. I just wondered because our given DataCap address was not initialized (no ID) at that time and I thought it could make problem. Now everything is OK. We will wait for notaries approval, thanks. @raghavrmadya

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Multisig Notary address

f01858410

Client address

f1lakkooooyscwt5crl3b5nj3szcnh5g2mcz3sxsa

DataCap allocation requested

7.5TiB

psh0691 commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacecnntdnbcpo3mamxtdvdke4luqy4custyqioesp263s7lqh6ctyay

Address

f1lakkooooyscwt5crl3b5nj3szcnh5g2mcz3sxsa

Datacap Allocated

7.50TiB

Signer Address

f1qdko4jg25vo35qmyvcrw4ak4fmuu3f5rif2kc7i

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacecnntdnbcpo3mamxtdvdke4luqy4custyqioesp263s7lqh6ctyay

dannyob commented 1 year ago

Hi, @raghavrmadya and @emf-developer -- I wonder if you could provide information on the current sanctions status of SUT? I've been able to establish that it is a target of financial sanctions by the current UK government. (See https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/1082660/iran__nuclear_.pdf , entity #122), and in the EU according to https://data.europa.eu/data/datasets/consolidated-list-of-persons-groups-and-entities-subject-to-eu-financial-sanctions?locale=en (I searched for EU.2734.5 which is SUT's EU reference number).

Note that in the US, there would be a potential exemption for any sanctions on academic data in the Berman Amendment, 50 U.S. Code § 1702 b(3), and I haven't yet found a sanction on SUT in the US anyway. I don't know enough on other country's laws to even comment on this, but I presume you've looked into it @emf-developer ?

emf-developer commented 1 year ago

Hi @dannyob . We are talking about open scientific and academic data distribution. Most countries including US, EU and the UN are agree that sanctions must not affect open data communications. Our program is Based on the Freedom of Information Laws and we are about to store our open data publicly in OPEN World for global use. We are aware of rules and respect global communication manners but we don't consider such injustices to be correct. Furthermore, as you said, the US government has made exemptions from regulations under the Berman Amendment for such cases.

Germany ministry of foreign affairs: https://www.auswaertiges-amt.de/en/aussenpolitik/laenderinformationen/iran-node/iran/218250 Finland ministry of foreign affairs: https://um.fi/sanctions-questions-and-answers#:~:text=1.6.%20Our%20university%20is%20actively%20searching%20for%20international%20opportunities.%C2%A0Could%20our%20plans%20be%20affected%20by%20sanctions%3F treasury.gov: https://home.treasury.gov/news/press-releases/js1295 UN: https://news.un.org/en/story/2022/07/1122152 https://www.ohchr.org/en/press-releases/2022/07/unilateral-sanctions-threaten-scientific-research-and-academic-freedom-un and many more...

On the other hand, We are all contributing in Filecoin Universe. A WEB3, Decentralized Network and filecoin-plus project which its primary goal is to aim to maximize the amount of useful storage which is based on the FUNDAMENTAL Goals of Cryptocurrency Platforms that one of the most important rules of these decentralized blockchain-based platforms is to not allow governments to have any influence on the non-discriminatory manners of their systems. It is disappointing for everyone who cares about freedom, knowledge and wisdom to see these kind of problems.

We hope there would be no other concerns for our application and thank you for your attention and support.

GaryGJG commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacebpslgm5royjkmqdrejnhexosqqdymut5qbopj3suguel7okeouyo

Address

f1lakkooooyscwt5crl3b5nj3szcnh5g2mcz3sxsa

Datacap Allocated

7.50TiB

Signer Address

f1zffqhxwq2rrg7rtot6lmkl6hb2xyrrseawprzsq

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebpslgm5royjkmqdrejnhexosqqdymut5qbopj3suguel7okeouyo

BDE-io commented 1 year ago

@emf-developer Hi! Great to see you have gotten approval for DataCap and advancing the mission of preserving humanity’s most important information. If you are looking for more storage providers to store these data or have any questions, please visit #bigdata-exchange on Filecoin Slack or reply here.

We have strong demand from a diverse group of SPs, who are actively looking to onboard more data.

filplus-checker commented 1 year ago

DataCap and CID Checker Report[^1]

If this is the first time a provider takes verified deal, it will be marked as new.

For most of the datacap application, below restrictions should apply.

⚠️ f01851060 has sealed 31.46% of total datacap.

⚠️ 29.59% of total deal sealed by f01421708 are duplicate data.

⚠️ 66.67% of total deal sealed by f022352 are duplicate data.

⚠️ 50.00% of total deal sealed by f01944347 are duplicate data.

⚠️ 50.00% of total deal sealed by f01345523 are duplicate data.

Provider Location Total Deals Sealed Percentage Unique Data Duplicate Deals
f01851060 Las Vegas, Nevada, US 794.13 GiB 31.46% 794.13 GiB 0.00%
f01421708new Tehran, Tehran, IR 603.75 GiB 23.92% 425.13 GiB 29.59%
f01786387 Heerhugowaard, North Holland, NL 468.88 GiB 18.58% 468.88 GiB 0.00%
f01392893 Amsterdam, North Holland, NL 389.50 GiB 15.43% 372.88 GiB 4.27%
f01953985 Shenzhen, Guangdong, CN 100.13 GiB 3.97% 100.13 GiB 0.00%
f01278 Grand Rapids, Michigan, US 64.00 GiB 2.54% 60.00 GiB 6.25%
f01923786 Hong Kong, Central and Western, HK 40.13 GiB 1.59% 40.13 GiB 0.00%
f010088 Everett, Washington, US 34.00 GiB 1.35% 34.00 GiB 0.00%
f01938671 Hong Kong, Central and Western, HK 28.13 GiB 1.11% 28.13 GiB 0.00%
f022352 Oslo, Oslo, NO 384.00 MiB 0.01% 128.00 MiB 66.67%
f01944347 Maywood Park, Oregon, US 256.00 MiB 0.01% 128.00 MiB 50.00%
f01345523 Antwerpen, Flanders, BE 256.00 MiB 0.01% 128.00 MiB 50.00%
f01937995 Hong Kong, Central and Western, HK 128.00 MiB 0.00% 128.00 MiB 0.00%
f01952350 Maywood Park, Oregon, US 128.00 MiB 0.00% 128.00 MiB 0.00%
f01926635 Hong Kong, Central and Western, HK 128.00 MiB 0.00% 128.00 MiB 0.00%
f01949183 Maywood Park, Oregon, US 128.00 MiB 0.00% 128.00 MiB 0.00%
f01947770 Hong Kong, Central and Western, HK 128.00 MiB 0.00% 128.00 MiB 0.00%

Provider Distribution

Deal Data Replication

The below table shows how each many unique data are replicated across storage providers.

⚠️ 99.88% of deals are for data replicated across less than 4 storage providers.

Unique Data Size Total Deals Made Number of Providers Deal Percentage
1.61 TiB 1.63 TiB 1 65.98%
297.75 GiB 773.50 GiB 2 30.64%
26.00 GiB 82.00 GiB 3 3.25%
128.00 MiB 3.13 GiB 15 0.12%

Replication Distribution

Deal Data Shared with other Clients

The below table shows how many unique data are shared with other clients. Usually different applications owns different data and should not resolve to the same CID.

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 14 days, so for now it is being closed. Please feel free to contact the Fil+ Gov team to re-open the application if it is still being processed. Thank you!