Closed amughal closed 4 months ago
Thanks for your request! Everything looks good. :ok_hand:
A Governance Team member will review the information provided and contact you back pretty soon.
To unblock Mongo as a Data Preparer in absence of Spade, I asked Mongo to leverage BDE + LDN for the datasets he prepared months ago. These datasets from Common Crawl are deemed useful to store for preservation of humanity information by the Slingshot community.
The minimum Datacap requested is 500TB.
@Sunnyiscoming My total requested is 100TB and minimum weekly request is 50TB. Are you saying that for BDE data publishing, the minimum requirement is 500TIB?
@amughal you should ask for 1000 TiB. Each dataset copy is ~100 TiB and Slingshot encourages 10 copies for disaster resiliency. If you want to apply on behalf of all your Common Crawl datasets that you prepared, this number would be even higher.
Thank you for the guidance, and that makes sense and now I understand what @Sunnyiscoming was suggesting. Let me update this request to reflect the current dataset which is ready. I will post request for the next round as more data sets will be fully ready. Thank you
Thanks for your request! Everything looks good. :ok_hand:
A Governance Team member will review the information provided and contact you back pretty soon.
Total DataCap requested
1000TiB
Expected weekly DataCap usage rate
50TiB
Client address
f1c6huyblzf4s42mwxp5g7hlse4vmxeqjxv4idldy
f02049625
f1c6huyblzf4s42mwxp5g7hlse4vmxeqjxv4idldy
25TiB
04f1a47f-2177-44dd-a365-7e4c378aae43
This LDN is the dataset related to the famous project MoonLanding (Slingshot V3). The above links and messages have preliminarily proved that the data requirements match the application. Support and wish you all the best.
Your Datacap Allocation Request has been proposed by the Notary
bafy2bzacecivsf47ktrf5h4kww2ilfctpxoa5clkryuiqy32nfupqd47vvn5e
Address
f1c6huyblzf4s42mwxp5g7hlse4vmxeqjxv4idldy
Datacap Allocated
25.00TiB
Signer Address
f1tfg54zzscugttejv336vivknmsnzzmyudp3t7wi
Id
04f1a47f-2177-44dd-a365-7e4c378aae43
You can check the status of the message here: https://filfox.info/en/message/bafy2bzacecivsf47ktrf5h4kww2ilfctpxoa5clkryuiqy32nfupqd47vvn5e
Your Datacap Allocation Request has been approved by the Notary
bafy2bzaceay26rg4lklh2jpwdmkeua5vzbs7zren6h6oft7ju7nmflyetk224
Address
f1c6huyblzf4s42mwxp5g7hlse4vmxeqjxv4idldy
Datacap Allocated
25.00TiB
Signer Address
f1pszcrsciyixyuxxukkvtazcokexbn54amf7gvoq
Id
04f1a47f-2177-44dd-a365-7e4c378aae43
You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceay26rg4lklh2jpwdmkeua5vzbs7zren6h6oft7ju7nmflyetk224
I have heard of the Moon Landing project before and would like to support @amughal and @xmcai2016
Thank you all, appreciated.
f02049625
f1c6huyblzf4s42mwxp5g7hlse4vmxeqjxv4idldy
50TiB
c88fb01f-6f1c-4717-a914-6c2c598edab5
f02049625
f1c6huyblzf4s42mwxp5g7hlse4vmxeqjxv4idldy
100% of weekly dc amount requested
50TiB
25TiB
975TiB
Number of deals | Number of storage providers | Previous DC Allocated | Top provider | Remaining DC |
---|---|---|---|---|
506 | 3 | 25TiB | 56.68 | 5.41TiB |
@Sunnyiscoming @Kevin-FF-USA @galen-mcandrew @raghavrmadya @simonkim0515 Hello All, Seems like there is a datacap issue in this approval. I started sending large deals out of this LDN to the SPs in the last two days, but as of this morning, it is failing. The status is asking for signature again. Is this the weekly allocation issue, or tranche, trying to understand. With using SaaS provider, I need to send deals ASAP. Any help is appreciated. Thanks
My initial request was to allocate 50TB per week. Can I get that increased to 100TB, please?
checker:manualTrigger
✔️ Storage provider distribution looks healthy.
⚠️ 100.00% of deals are for data replicated across less than 3 storage providers.
⚠️ CID sharing has been observed. (Top 3)
[^1]: To manually trigger this report, add a comment with text checker:manualTrigger
[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger
[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...
Click here to view the CID Checker report. Click here to view the Retrieval report.
checker:manualTrigger
✔️ Storage provider distribution looks healthy.
⚠️ 100.00% of deals are for data replicated across less than 3 storage providers.
⚠️ CID sharing has been observed. (Top 3)
[^1]: To manually trigger this report, add a comment with text checker:manualTrigger
[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger
[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...
Click here to view the CID Checker report. Click here to view the Retrieval report.
@fabriziogianni7 @liyunzhi-666 @ Hello Notaries. I need next tranche for this LDN. 1) I had accidentally a small set of CAR files mixed with another LDN, but i will make sure that this won't happen again. 2) In the next round of data seiling, next two SPs are fully GEO diverse (Asia and East Coast US).
Please let me know if you have any questions.
Thanks
That's OK. But I supported your application in the last round, and by definition I shouldn't support you in two consecutive rounds, so you should look for another notary. @amughal
That's OK. But I supported your application in the last round, and by definition I shouldn't support you in two consecutive rounds, so you should look for another notary. @amughal
Okay thanks @liyunzhi-666, appreciated. I will reach to others.
Checking with other Notaries. Hello, @simonkim0515 @xinaxu @kevzak Could someone please approve the next tranche?
Thanks
@amughal is a previous ESPA participant and reputable in the ecosystem. Dataset is a public dataset so that checks out as well.
Approving the next datacap tranche however would like to see more replication across more SPs going forward.
Your Datacap Allocation Request has been proposed by the Notary
bafy2bzacebzzgpb5yts6nspipawuwwlwt4hxkyd2sbn2uotwswgodi5elwxae
Address
f1c6huyblzf4s42mwxp5g7hlse4vmxeqjxv4idldy
Datacap Allocated
50.00TiB
Signer Address
f1kqdiokoeubyse4qpihf7yrpl7czx4qgupx3eyzi
Id
c88fb01f-6f1c-4717-a914-6c2c598edab5
You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebzzgpb5yts6nspipawuwwlwt4hxkyd2sbn2uotwswgodi5elwxae
Thank you @jamerduhgamer . Definitely the next tranche will be hosted by another SP and will also showcase the GEO redundancy.
@amughal Hi there
@amughal Hi there
1. It would be clear if you could tell us the details about SPid you will work with for the next round. 2. As you mentioned that you will have 10 copies. How will you improve data replication?
Hi @ipollo00, The next SPs are:
Thanks
@amughal Hi there
1. It would be clear if you could tell us the details about SPid you will work with for the next round. 2. As you mentioned that you will have 10 copies. How will you improve data replication?
Hi @ipollo00, The next SPs are:
- South Korea, miner id is f01697248. Waiting for this tranche to start sealing.
- US East Coast, miner id is f01717477. He is also waiting for the next tranche.
Thanks
Can you confirm that these SP’s are really present at these locations and that they are not solely using the VPN construction to decieve their location, exploit Fil+ to get extra datacap whilst located in Asia
I will definitely ask them for more clarification.
On Wed, Jul 12, 2023 at 10:45 AM CryptoWhizzard @.***> wrote:
@amughal https://github.com/amughal Hi there
It would be clear if you could tell us the details about SPid you will work with for the next round.
As you mentioned that you will have 10 copies. How will you improve data replication?
Hi @ipollo00 https://github.com/ipollo00, The next SPs are:
- South Korea, miner id is f01697248. Waiting for this tranche to start sealing.
- US East Coast, miner id is f01717477. He is also waiting for the next tranche.
Thanks
Can you confirm that these SP’s are really present at these locations and that they are not solely using the VPN construction to decieve their location, exploit Fil+ to get extra datacap whilst located in Asia
— Reply to this email directly, view it on GitHub https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1724#issuecomment-1632954552, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABBKWAJRLROB4GYVGJKFCNLXP3PELANCNFSM6AAAAAAVPLKVCI . You are receiving this because you were mentioned.Message ID: <filecoin-project/filecoin-plus-large-datasets/issues/1724/1632954552@ github.com>
Hi @ipollo00 , I have received detailed replies from both SPs, pictures attached as well.
South Korea (f01697248): I have the traceroute sourcing the Boost IP address and it does show Korea. I have also received pictures of their data center.
US East Coast (f01717477): Location: Atlanta USA, Sungard DC. ISP is Unitas Global
DD: Both miners are reachable. However, f01697248 have received deals from 5 clients, two of them shows are not retrievable in those reports. It might be a long time ago. Two of the client's reports show a not high retrieval rate. But, it is acceptable from my end. Will keep following up on the retrieval rate in this application. f01717477 have received deals from one client (in my opinion). The result shown in the report was acceptable. Willing to sign for this round. Based on guidelines, if those two sps are not shown in the next tranche, I‘m afraid that notaries may not support in the future.
DD: Both miners are reachable. However, f01697248 have received deals from 5 clients, two of them shows are not retrievable in those reports. It might be a long time ago. Two of the client's reports show a not high retrieval rate. But, it is acceptable from my end. Will keep following up on the retrieval rate in this application. f01717477 have received deals from one client (in my opinion). The result shown in the report was acceptable. Willing to sign for this round. Based on guidelines, if those two sps are not shown in the next tranche, I‘m afraid that notaries may not support in the future.
Your Datacap Allocation Request has been approved by the Notary
bafy2bzacecoz32dgq2ejzmhift3oiuizk54vctfjm5uwvi454wfj2kuce2dve
Address
f1c6huyblzf4s42mwxp5g7hlse4vmxeqjxv4idldy
Datacap Allocated
50.00TiB
Signer Address
f1n5wlrrhoxpkgwij25xrtt7w7g2k3fhbthmdn6ri
Id
c88fb01f-6f1c-4717-a914-6c2c598edab5
You can check the status of the message here: https://filfox.info/en/message/bafy2bzacecoz32dgq2ejzmhift3oiuizk54vctfjm5uwvi454wfj2kuce2dve
Thanks @ipollo00 .
f02049625
f1c6huyblzf4s42mwxp5g7hlse4vmxeqjxv4idldy
100TiB
d03a8e9f-bcb8-423d-bbcb-d58057309f8c
f02049625
f1c6huyblzf4s42mwxp5g7hlse4vmxeqjxv4idldy
200% of weekly dc amount requested
100TiB
4547.5YiB
4547.5YiB
Number of deals | Number of storage providers | Previous DC Allocated | Top provider | Remaining DC |
---|---|---|---|---|
2086 | 6 | 50TiB | 53.49 | 11.44TiB |
Client has responded transparently to the request to include more SPs and more geo locations. They have also done their best attempt to check the VPN concern as well. As long as the retrievability concern is addressed, willing to continue supporting.
However, I will not approve this next tranche as I was apart of the previous datacap tranche allocation.
Hello @fabriziogianni7
Could you please help in signing off this tranche?
Thank you
checker:manualTrigger
⚠️ 1 storage providers sealed more than 50% of total datacap - f01697248: 59.65%
⚠️ 99.37% of deals are for data replicated across less than 4 storage providers.
⚠️ CID sharing has been observed. (Top 3)
[^1]: To manually trigger this report, add a comment with text checker:manualTrigger
[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger
[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...
Click here to view the CID Checker report. Click here to view the Retrieval Dashboard. Click here to view the Retrieval report.
@amughal what is your plan to solve:
⚠️ 99.37% of deals are for data replicated across less than 4 storage providers.
Hello @herrehesse . Per the above report, data is already hosted with 6 unique providers. As more tranche is available, I am hoping to further increase the number of SPs and GEO diversity.
The report isn't perfect, but I'm willing to support it this round. Hopefully, we'll see improvement in the next round.
Your Datacap Allocation Request has been proposed by the Notary
bafy2bzacecysk7ypf4r2x7ogzdigte6mxep4352m7uv5oh64opyipoo4bmh62
Address
f1c6huyblzf4s42mwxp5g7hlse4vmxeqjxv4idldy
Datacap Allocated
100.00TiB
Signer Address
f174fg3bqbln3zjnkxtyf6s54txqkr7yqkj6cig7y
Id
d03a8e9f-bcb8-423d-bbcb-d58057309f8c
You can check the status of the message here: https://filfox.info/en/message/bafy2bzacecysk7ypf4r2x7ogzdigte6mxep4352m7uv5oh64opyipoo4bmh62
Thank you @Aifabot-Cloud
⚠️ 1 storage providers sealed more than 50% of total datacap - f01697248: 59.65%
⚠️ 99.37% of deals are for data replicated across less than 4 storage providers.
⚠️ CID sharing has been observed. (Top 3)
[^1]: To manually trigger this report, add a comment with text checker:manualTrigger
[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger
[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...
Click here to view the CID Checker report. Click here to view the Retrieval Dashboard. Click here to view the Retrieval report.
Data Owner Name
Common Crawl
Data Owner Country/Region
United States
Data Owner Industry
Other
Website
https://commoncrawl.org/2020/11/october-2020-crawl-archive-now-available/
Social Media
Total amount of DataCap being requested
1000 TiB
Weekly allocation of DataCap requested
50TiB
On-chain address for first allocation
f1c6huyblzf4s42mwxp5g7hlse4vmxeqjxv4idldy
Custom multisig
Identifier
No response
Share a brief history of your project and organization
Is this project associated with other projects/ecosystem stakeholders?
Yes
If answered yes, what are the other projects/ecosystem stakeholders
Describe the data being stored onto Filecoin
Where was the data currently stored in this dataset sourced from
Other
If you answered "Other" in the previous question, enter the details here
How do you plan to prepare the dataset
singularity
If you answered "other/custom tool" in the previous question, enter the details here
No response
Please share a sample of the data
Confirm that this is a public dataset that can be retrieved by anyone on the Network
If you chose not to confirm, what was the reason
No response
What is the expected retrieval frequency for this data
Daily
For how long do you plan to keep this dataset stored on Filecoin
1.5 to 2 years
In which geographies do you plan on making storage deals
Greater China, Asia other than Greater China, Africa, North America, South America, Europe, Australia (continent)
How will you be distributing your data to storage providers
HTTP or FTP server
How do you plan to choose storage providers
Slack, Big data exchange
If you answered "Others" in the previous question, what is the tool or platform you plan to use
No response
If you already have a list of storage providers to work with, fill out their names and provider IDs below
No response
How do you plan to make deals to your storage providers
Boost client, Singularity
If you answered "Others/custom tool" in the previous question, enter the details here
No response
Can you confirm that you will follow the Fil+ guideline
Yes