filecoin-project / filecoin-plus-large-datasets

Hub for client applications for DataCap at a large scale
109 stars 62 forks source link

[DataCap Application] X-Order Lab #936

Closed Leozhang404 closed 1 year ago

Leozhang404 commented 2 years ago

Large Dataset Notary Application

To apply for DataCap to onboard your dataset to Filecoin, please fill out the following.

Core Information

Please respond to the questions below by replacing the text saying "Please answer here". Include as much detail as you can in your answer.

Project details

Share a brief history of your project and organization.

X-Order Lab is a decentralized laboratory focusing on basic theoretical research, and is committed to build a well-known academic research organization around the world.

What is the primary source of funding for this project?

This project is our interest and funded by our personal income.

What other projects/ecosystem stakeholders is this project associated with?

No.

Use-case details

Describe the data being stored onto Filecoin

The data include the teaching material, books, slides, book club videos/audios and lecture materials etc.

Where was the data in this dataset sourced from?

 - The books and teaching material is published Publicly.
 - The slides are made by ourselves for the presentation of the book club.
 - The videos and audios are recorded by ourselves.
 - The lecture materials are made by our team members, experts and scholars

Can you share a sample of the data? A link to a file, an image, a table, etc., are good ways to do this.

https://www.bilibili.com/video/BV1d441117nj
https://www.bilibili.com/video/BV1sJ411E7VS
https://www.bilibili.com/video/BV1MJ411y7Uk
https://www.bilibili.com/video/BV17J411Q7Rb
https://www.bilibili.com/video/BV1kJ411D7bn
https://www.bilibili.com/video/BV1kJ411D7QR
https://www.bilibili.com/video/BV1FE411h7pT
https://www.bilibili.com/video/BV1cE411Y7pS
https://www.bilibili.com/video/BV1s441117TG
https://www.bilibili.com/video/BV1d4411g75n

Confirm that this is a public dataset that can be retrieved by anyone on the Network (i.e., no specific permissions or access rights are required to view the data).

Yes. All data is public and can be retrieved by anyone on the Network.

What is the expected retrieval frequency for this data?

About once a week.

For how long do you plan to keep this dataset stored on Filecoin?

At least 500 days.

DataCap allocation plan

In which geographies (countries, regions) do you plan on making storage deals?

Asia, EU, North America, and America.

How will you be distributing your data to storage providers? Is there an offline data transfer process?

We'll upload the data to the web server or IPFS Nodes, and swan-client(https://github.com/filswan/go-swan-client) will be used to distribute the data tostorage providers. 

How do you plan on choosing the storage providers with whom you will be making deals? This should include a plan to ensure the data is retrievable in the future both by you and others.

We'll send deals storage providers through the FilSwan platform. Market Matcher, a module of the platfrom, will choose the best storage providers for us, which ensures that the storage providers accept the retrieval deals.

How will you be distributing deals across storage providers?

We are using FilSwan client agent for batch sending deals.  https://github.com/filswan/swan-client
FilSwan has a reputation module called Swan reputation system (https://docs.filswan.com/filswan-platform/overview/reputation-system) to give storage providers scores for the data storage behavior, it based on Time-based Reachability + Regional Weighted Adjusted Power + General Deals and Verified-Storage Provider Deals

FilSwan Auction System will match the storage providers based on reputation and coditions. For the bidding policy you can find it here: https://docs.filswan.com/filswan-platform/overview/filswan-auction-system.

Some of the storage providers are as follows:
f0143858
f03624
f010088
f02301
f0187709
f01402814
f01859603
f01133080
f01858429
f01398391
f01072221
f0240185
f01390330
f01784458
f01840390
f01870135
f0520660
f01871352
f01883179
f01886797

Do you have the resources/funding to start making deals as soon as you receive DataCap? What support from the community would help you onboard onto Filecoin?

Yes. We have enough funding, and then we'll ask the Filswan team for some help. 
large-datacap-requests[bot] commented 2 years ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

large-datacap-requests[bot] commented 2 years ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

raghavrmadya commented 2 years ago

Datacap Request Trigger

Total DataCap requested

2PiB

Expected weekly DataCap usage rate

90TiB

Client address

f13zfhkvtf6mewd74jhav6iqcggwsojht73tickga

large-datacap-requests[bot] commented 2 years ago

DataCap Allocation requested

Multisig Notary address

f01858410

Client address

f13zfhkvtf6mewd74jhav6iqcggwsojht73tickga

DataCap allocation requested

45TiB

raghavrmadya commented 2 years ago

@flyworker, tagging as application mentions FilSwan. Please provide input if you know of the client to help build trust with notaries

flyworker commented 2 years ago

@raghavrmadya Thanks for let me know. There are raise of interests using filswan in the community as the datacap distributor. i believe they have some channel or in the community using it. I have the following questions for @Leozhang404

Leozhang404 commented 2 years ago

@raghavrmadya Thanks for let me know. There are raise of interests using filswan in the community as the datacap distributor. i believe they have some channel or in the community using it. I have the following questions for @Leozhang404

  • Where did you hear about FilSwan?
  • Which tool will you use for sending out deals?
  • Do you have preference of the region?

Thanks @flyworker for your reply

  1. FilSwan is very famous in China region, there is a 200+ developers wechat group . SPs get the deal daily so we know the project and tools well. Recently the DataDAO hackathon is spreading in lots of wechat groups so we know this platform

  2. We have tried the swan client tool (https://github.com/filswan/go-swan-client) it works well

  3. We don't have preference of geolocations but we may prefer to have more copies in China, since the content have more audience in China

As a voluntary distributed research organization, in the long run, we do not have enough storage space to store our important research and activity data, so we hope to find a suitable long-term storage method. I’m one of the community members of FilSwan and have been following their project progress. As far as I know, FilSwan has a set of mature tools that can help us store our data onto the Filecoin network conviniently. Through the Filswan official website, their personnel and the community, we’ve also learned that they have many providers who could support the storage. Therefore we believe choosing FilSwan is a good way for us.

neogeweb3 commented 2 years ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacebwdd27vmpsjltagb6qvkmshm7qwavbhjk6scajdz3yq32m3jrog4

Address

f13zfhkvtf6mewd74jhav6iqcggwsojht73tickga

Datacap Allocated

45.00TiB

Signer Address

f13k5zr6ovc2gjmg3lvd43ladbydhovpylcvbflpa

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebwdd27vmpsjltagb6qvkmshm7qwavbhjk6scajdz3yq32m3jrog4

kernelogic commented 2 years ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacecipypq4i3gvsyvpniclkgmgdrjbiq2i2x7rncsilg266dzqixweg

Address

f13zfhkvtf6mewd74jhav6iqcggwsojht73tickga

Datacap Allocated

45.00TiB

Signer Address

f1yjhnsoga2ccnepb7t3p3ov5fzom3syhsuinxexa

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacecipypq4i3gvsyvpniclkgmgdrjbiq2i2x7rncsilg266dzqixweg

BDE-io commented 2 years ago

@Leozhang404 Hi! Great to see you have gotten approval for DataCap. If you are looking for more storage providers to store these data or have any questions, please visit #bigdata-exchange on Filecoin Slack or reply here.

We have strong demand from a diverse group of SPs, who are actively looking to onboard more data.

large-datacap-requests[bot] commented 2 years ago

DataCap Allocation requested

Request number 2

Multisig Notary address

f01858410

Client address

f13zfhkvtf6mewd74jhav6iqcggwsojht73tickga

DataCap allocation requested

90TiB

large-datacap-requests[bot] commented 2 years ago

Stats & Info for DataCap Allocation

Multisig Notary address

f01858410

Client address

f13zfhkvtf6mewd74jhav6iqcggwsojht73tickga

Last two approvers

kernelogic & neogeweb3

Rule to calculate the allocation request amount

100% of weekly dc amount requested

DataCap allocation requested

90TiB

Total DataCap granted for client so far

45TiB

Datacap to be granted to reach the total amount requested by the client (2PiB)

1.95PiB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
undefined undefined 45TiB 12.62 9.37TiB
IreneYoung commented 2 years ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacedlfcqyqqj67yixcqhvtk4222vwkbfhxpa26grkufiyfm26wl3f3s

Address

f13zfhkvtf6mewd74jhav6iqcggwsojht73tickga

Datacap Allocated

90.00TiB

Signer Address

f1d4gmpqz3execjj2wvrxuuhvbms5mzh7t7yqrviq

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacedlfcqyqqj67yixcqhvtk4222vwkbfhxpa26grkufiyfm26wl3f3s

cryptowhizzard commented 2 years ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacebw2lxjbm2lyuxcfmsi42hjo2orezc3cd2si5w6mvw4qcuh2sqtuc

Address

f13zfhkvtf6mewd74jhav6iqcggwsojht73tickga

Datacap Allocated

90.00TiB

Signer Address

f1krmypm4uoxxf3g7okrwtrahlmpcph3y7rbqqgfa

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebw2lxjbm2lyuxcfmsi42hjo2orezc3cd2si5w6mvw4qcuh2sqtuc

filplus-checker commented 1 year ago

DataCap and CID Checker Report[^1]

If this is the first time a provider takes verified deal, it will be marked as new.

For most of the datacap application, below restrictions should apply.

⚠️ f01947280 has sealed 26.93% of total datacap.

Provider Location Total Deals Sealed Percentage Unique Data Duplicate Deals
f01947280 Hangzhou, Zhejiang, CN 16.34 TiB 26.93% 16.09 TiB 1.53%
f01786736 Saint-Gabriel, Quebec, CA 5.00 TiB 8.24% 5.00 TiB 0.00%
f01896422 Fremont, California, US 4.06 TiB 6.69% 4.06 TiB 0.00%
f01225882 Burnaby, British Columbia, CA 4.06 TiB 6.69% 4.06 TiB 0.00%
f01858429 Boston, Massachusetts, US 4.03 TiB 6.64% 4.03 TiB 0.00%
f01390330 Xi’an, Shaanxi, CN 3.69 TiB 6.08% 3.69 TiB 0.00%
f0240185 Clifton, New Jersey, US 3.53 TiB 5.82% 3.53 TiB 0.00%
f0143858 Clifton, New Jersey, US 3.44 TiB 5.66% 3.44 TiB 0.00%
f01402814 Singapore, Singapore, SG 3.00 TiB 4.94% 3.00 TiB 0.00%
f01886797 Vancouver, British Columbia, CA 2.97 TiB 4.89% 2.97 TiB 0.00%
f01163272 Perm, Perm Krai, RU 2.03 TiB 3.35% 2.03 TiB 0.00%
f010088 Everett, Washington, US 1.97 TiB 3.24% 1.97 TiB 0.00%
f0187709 Moscow, Moscow, RU 1.75 TiB 2.88% 1.75 TiB 0.00%
f01222595 Moscow, Moscow, RU 1.66 TiB 2.73% 1.66 TiB 0.00%
f08399 Seattle, Washington, US 1.50 TiB 2.47% 1.50 TiB 0.00%
f0214334 Gwangju, Gwangju, KR 1.00 TiB 1.65% 1.00 TiB 0.00%
f0717969 Los Angeles, California, US 448.00 GiB 0.72% 448.00 GiB 0.00%
f01683871 Gwangju, Gwangju, KR 160.00 GiB 0.26% 160.00 GiB 0.00%
f01955030new Hangzhou, Zhejiang, CN 32.00 GiB 0.05% 32.00 GiB 0.00%
f01708981 Shenzhen, Guangdong, CN 32.00 GiB 0.05% 32.00 GiB 0.00%

Provider Distribution

Deal Data Replication

The below table shows how each many unique data are replicated across storage providers.

⚠️ 30.07% of deals are for data replicated across less than 4 storage providers.

Unique Data Size Total Deals Made Number of Providers Deal Percentage
18.00 TiB 18.25 TiB 1 30.07%
480.00 GiB 3.28 TiB 7 5.41%
544.00 GiB 4.25 TiB 8 7.00%
512.00 GiB 4.50 TiB 9 7.42%
576.00 GiB 5.63 TiB 10 9.27%
512.00 GiB 5.50 TiB 11 9.06%
672.00 GiB 7.88 TiB 12 12.98%
576.00 GiB 7.31 TiB 13 12.05%
128.00 GiB 1.75 TiB 14 2.88%
160.00 GiB 2.34 TiB 15 3.86%

Replication Distribution

Deal Data Shared with other Clients

The below table shows how many unique data are shared with other clients. Usually different applications owns different data and should not resolve to the same CID.

⚠️ CID sharing has been observed.

Other Client Application Total Deals Affected Unique CIDs Verifier
f1o54sve7ede7im4caux3ug7lsyjmbue7ss3zzl6y FilSwan 59.25 TiB 562 LDN v3 multisig
f1bstbq5bi72kyovhh7zoo2f6l6uivsjz4ey5dnqq FilSwan 27.72 TiB 298 LDN v3 multisig
f1j22hqwh5tijbaztl7a7plzk5q2s4cesd67nyv3a FilSwan 7.72 TiB 62 LDN v3 multisig
f3v7x4a2aapgx6o2r477tenoin3u5oadaeqyd7kjd
sitykvf4ok7vq2utcyh34lmu5u7oybs25ff6s4dbu
dpma
LeoCheung - Slingshot Restore 5.53 TiB 33 LDN v3 multisig
f3rwypb3nhyzkslf6eb2qoiaqdwvaedcawxiav4hm
hi4cp4w7ptjua42knqoddbl5uabcrtzgq7jajvjro
z54a
dropcool - Slingshot Restore 1.63 TiB 33 LDN v3 multisig
f14r2jybmccwiu6hze4fu55jyhclktvwacec56hea FilSwan 544.00 GiB 11 LDN v3 multisig
f13d6zt424jkp55u7kp67azotkzadtlokwnz2ntxa FilSwan - Slingshot Restore 192.00 GiB 6 LDN v3 multisig
f1r3d25hl2y7rqlsu2mgczdethy4qqjmkfdlmibfq NEXRAD - FilSwan 96.00 GiB 3 LDN v3 multisig

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 3

Multisig Notary address

f01858410

Client address

f13zfhkvtf6mewd74jhav6iqcggwsojht73tickga

DataCap allocation requested

180TiB

Id

1175e6c0-ab1f-4a48-b423-fd3177fbd20b

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f01858410

Client address

f13zfhkvtf6mewd74jhav6iqcggwsojht73tickga

Last two approvers

cryptowhizzard & IreneYoung

Rule to calculate the allocation request amount

200% of weekly dc amount requested

DataCap allocation requested

180TiB

Total DataCap granted for client so far

135TiB

Datacap to be granted to reach the total amount requested by the client (2PiB)

1.86PiB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
3187 23 90TiB 16.88 22.03TiB
filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report[^1]

Storage Provider Distribution

The below table shows the distribution of storage providers that have stored data for this client.

If this is the first time a provider takes verified deal, it will be marked as new.

For most of the datacap application, below restrictions should apply.

Since this is the 3rd allocation, the following restrictions have been relaxed:

⚠️ f01981571 has unknown IP location.

Provider Location Total Deals Sealed Percentage Unique Data Duplicate Deals
f01225882 Burnaby, British Columbia, CA
Astute Hosting Inc.
4.06 TiB 4.35% 4.06 TiB 0.00%
f01947280 Hangzhou, Zhejiang, CN
China Mobile communications corporation
16.34 TiB 17.50% 16.09 TiB 1.53%
f01955030new Hangzhou, Zhejiang, CN
China Mobile communications corporation
32.00 GiB 0.03% 32.00 GiB 0.00%
f01390330 Xi’an, Shaanxi, CN
CHINANET-BACKBONE
3.69 TiB 3.95% 3.69 TiB 0.00%
f01708981 Shenzhen, Guangdong, CN
CHINANET-BACKBONE
32.00 GiB 0.03% 32.00 GiB 0.00%
f01858429 Boston, Massachusetts, US
Comcast Cable Communications, LLC
4.03 TiB 4.32% 4.03 TiB 0.00%
f0240185 Clifton, New Jersey, US
DigitalOcean, LLC
3.53 TiB 3.78% 3.53 TiB 0.00%
f0143858 Clifton, New Jersey, US
DigitalOcean, LLC
3.44 TiB 3.68% 3.44 TiB 0.00%
f01938223 Montréal, Quebec, CA
eStruxture Data Centers Inc.
11.25 TiB 12.05% 11.25 TiB 0.00%
f01786736 Saint-Gabriel, Quebec, CA
eStruxture Data Centers Inc.
5.00 TiB 5.35% 5.00 TiB 0.00%
f08399 Seattle, Washington, US
Isomedia, Inc.
1.50 TiB 1.61% 1.50 TiB 0.00%
f0214334 Gwangju, Gwangju, KR
Korea Telecom
1.00 TiB 1.07% 1.00 TiB 0.00%
f01683871 Gwangju, Gwangju, KR
Korea Telecom
160.00 GiB 0.17% 160.00 GiB 0.00%
f0717969 Los Angeles, California, US
Krypt Technologies
448.00 GiB 0.47% 448.00 GiB 0.00%
f01222595 Moscow, Moscow, RU
MTS PJSC
1.66 TiB 1.77% 1.66 TiB 0.00%
f01402814 Singapore, Singapore, SG
StarHub Ltd
3.00 TiB 3.21% 3.00 TiB 0.00%
f01970622 Hong Kong, Central and Western, HK
UCLOUD INFORMATION TECHNOLOGY (HK) LIMITED
14.19 TiB 15.19% 14.19 TiB 0.00%
f01981571 Unknown
Unknown
5.72 TiB 6.12% 5.72 TiB 0.00%
f01896422 Fremont, California, US
Hurricane Electric LLC
4.06 TiB 4.35% 4.06 TiB 0.00%
f0187709 Moscow, Moscow, RU
MTS PJSC
1.75 TiB 1.87% 1.75 TiB 0.00%
f01163272 Lys’va, Perm Krai, RU
PJSC Rostelecom
2.03 TiB 2.18% 2.03 TiB 0.00%
f0705704 Taipei, Taiwan, TW
UCLOUD INFORMATION TECHNOLOGY (HK) LIMITED
1.53 TiB 1.64% 1.53 TiB 0.00%
f010088 Everett, Washington, US
Wholesail networks LLC
1.97 TiB 2.11% 1.97 TiB 0.00%
f01886797 Vancouver, British Columbia, CA
Zayo Bandwidth
2.97 TiB 3.18% 2.97 TiB 0.00%

Provider Distribution

Deal Data Replication

The below table shows how each many unique data are replicated across storage providers.

Since this is the 3rd allocation, the following restrictions have been relaxed:

✔️ Data replication looks healthy.

Unique Data Size Total Deals Made Number of Providers Deal Percentage
31.19 TiB 31.44 TiB 1 33.67%
7.72 TiB 15.44 TiB 2 16.53%
480.00 GiB 3.75 TiB 8 4.02%
544.00 GiB 4.78 TiB 9 5.12%
512.00 GiB 5.00 TiB 10 5.35%
576.00 GiB 6.19 TiB 11 6.63%
512.00 GiB 6.00 TiB 12 6.43%
672.00 GiB 8.53 TiB 13 9.14%
576.00 GiB 7.88 TiB 14 8.43%
128.00 GiB 1.88 TiB 15 2.01%
160.00 GiB 2.50 TiB 16 2.68%

Replication Distribution

Deal Data Shared with other Clients

The below table shows how many unique data are shared with other clients. Usually different applications owns different data and should not resolve to the same CID.

However, this could be possible if all below clients use same software to prepare for the exact same dataset or they belong to a series of LDN applications for the same dataset.

⚠️ CID sharing has been observed.

Other Client Application Total Deals Affected Unique CIDs Approvers
f1o54sve7ede7im4caux3ug7lsyjmbue7ss3zzl6y FilSwan 74.56 TiB 601 3cryptowhizzard
3IreneYoung
1jamerduhgamer
1Joss-Hua
9kernelogic
2liyunzhi-666
1xingjitansuo
f1bstbq5bi72kyovhh7zoo2f6l6uivsjz4ey5dnqq FilSwan 41.47 TiB 670 3cryptowhizzard
1IreneYoung
7kernelogic
2liyunzhi-666
1psh0691
f1j22hqwh5tijbaztl7a7plzk5q2s4cesd67nyv3a FilSwan 19.63 TiB 321 3cryptowhizzard
4kernelogic
1liyunzhi-666
f3v7x4a2aapgx6o2r477tenoin3u5oadaeqyd7kjd
sitykvf4ok7vq2utcyh34lmu5u7oybs25ff6s4dbu
dpma
LeoCheung - Slingshot Restore 5.56 TiB 33 1IreneYoung
1Joss-Hua
1liyunzhi-666
1MetaWaveInfo
1neogeweb3
1psh0691
f3rwypb3nhyzkslf6eb2qoiaqdwvaedcawxiav4hm
hi4cp4w7ptjua42knqoddbl5uabcrtzgq7jajvjro
z54a
dropcool - Slingshot Restore 1.63 TiB 33 2flyworker
1IreneYoung
1neogeweb3
f14r2jybmccwiu6hze4fu55jyhclktvwacec56hea FilSwan 544.00 GiB 11 3cryptowhizzard
2dannyob
1kernelogic
2MegTei
2neogeweb3
1Reiers
f13d6zt424jkp55u7kp67azotkzadtlokwnz2ntxa FilSwan - Slingshot Restore 192.00 GiB 6 4cryptowhizzard
1dannyob
2IreneYoung
2kernelogic
2MegTei
1psh0691
1Reiers
1s0nik42
1swatchliu
f1r3d25hl2y7rqlsu2mgczdethy4qqjmkfdlmibfq NEXRAD - FilSwan 96.00 GiB 3 1cryptowhizzard
2IreneYoung
1jamerduhgamer
5kernelogic
1liyunzhi-666
1Reiers
1xingjitansuo

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

large-datacap-requests[bot] commented 1 year ago

Thanks for your request! :exclamation: We have found some problems in the information provided. We could not find Organization Name field in the information provided We could not find Website \/ Social Media field in the information provided We could not find Total amount of DataCap being requested (between 500 TiB and 5 PiB) field in the information provided We could not find Weekly allocation of DataCap requested (usually between 1-100TiB) field in the information provided We could not find On-chain address for first allocation field in the information provided We could not find Data Type of Application field in the information provided

Please, take a look at the request and edit the body of the issue providing all the required information.
aggregation-and-compliance-bot[bot] commented 11 months ago
Client f01929808 does not follow the datacap usage rules. More info here. This application has been failing the requirements for 7 days. Please take appropiate action to fix the following DataCap usage problems. Criteria Treshold Reason
Cid Checker score > 25% The client has a CID checker score of 19%. This should be greater than 25%. To find out more about CID checker score please look at this issue: https://github.com/filecoin-project/notary-governance/issues/986
Shared data percent < 20% 42.59% of the clients data is shared with other clients. This should be less than 20%