filecoin-project / notary-governance

114 stars 58 forks source link

Proposal: Project Beacon - 11.8 PiB Data Set / 60 PiB DataCap #564

Closed Beacon-Edu closed 1 year ago

Beacon-Edu commented 2 years ago

We are a Chinese education group in the Fortune Global 500. We would like to propose a collaborative plan to include a total of 60 PiBs into the Filecoin network. This will represent at least 5 full replicas of a 11.8PiB data set, further details of this project are as below.

Project Description We have a 11.8PiB project to prove the value proposition of our group's decentralized data storage. These data sets are the output of our online and offline educational materials from the past 10 years, including courses, documents, and classroom recordings videos. Due to the privacy of students and teachers, our group expects the data for this project to be kept confidential, namely encrypted data.

Our education group has been working with large data sets (PiB) for over 6 years with an interest in pursuing the Filecoin network as a solution to cost savings as well as some of our backup data issues. Starting with the 11.8PiB project makes sense for us, as it represents a small portion of our complete archive.

Data Set The dataset contains courses from more than 100 schools, documentation managed by teams of teachers, and over 3,600 videos of lessons recordings. As previously mentioned, the data itself is not of use to anyone except our group due to the privacy of students and teachers, and the behavior of teachers in the classroom. Therefore, this data will be encrypted for our own consideration.

Chart1

Chart2

Website/Social Media Due to the privacy concerns of many parties involved in our educational programs, our website will only be available to PL and the relevant notaries.

Transparency in KYC Our group is one of the top education groups in China and we will be going through KYC process to validate our credentials, which includes face-to-face meetings with notaries, PL and our technical team to present and verify sample data.

We understand that encrypting data complicates Filecoin Plus to verify this project. However, we are committed to transparency as far as we can, such as previously mentioned KYC with notaries, PL and storage providers, as well as disclosing official email and data transfer details along with sample data. However, all parties involved in this project will be required to sign confidentiality commitment letters to protect the security of our data.

We would like to express our gratitude to the community for allowing the project to move forward. In working closely with Protocol Labs and the Filecoin Foundation, we will follow these recommendations as a path forward to make this project a success for us and the network. 1.Submit a Proposal Issue in the Notary Governance repo for the entire project 2.Submit a 60PiB datacap applications

  1. At least four notaries have agreed to support these LDNs, while we are in constant contact with additional notaries
  2. We have found five SPs so far, and we are constantly looking for more and contact with technically superior SPs in China

Data Storage Plan 5 full replicas, a total of 60 PiB of Datacap

Due to government policy, our encrypted data sets can only be sealed by SP in China. Our current primary SP partners are listed in the table below:

image

Notaries that Support the Project [person/org/ Notary ID] Updated with additional from comments below Edit for clarification: Notaries that have expressed an interest in performing diligence on this project and client, and are willing to be placed on the notary msig address for approving allocations after performing ongoing diligence.

  1. ByteBase, f1yh6q3nmsg7i2sys7f7dexcuajgoweudcqj2chfi (Lead Notary), GCR
  2. Fenbushi, f1yqydpmqb5en262jpottko2kd65msajax7fi4rmq, GCR
  3. Firefly, f1fg6jkxsr3twfnyhdlatmq36xca6sshptscds7xa, Asia
  4. Pluskit, f1tgnlhtcmhwipfm7thsftxhn5k52velyjlazpvka, Asia
  5. Metawave, (@MetaWaveInfo ), f1ktlkcxnmzxcdaoqfsunrg3vocfbmgv4n3mrn74a, Oceania
  6. Tianji Studio (@liyunzhi-666 ), f1pszcrsciyixyuxxukkvtazcokexbn54amf7gvoq, GCR
  7. Kernelogic (@kernelogic ), f1yjhnsoga2ccnepb7t3p3ov5fzom3syhsuinxexa, NA
  8. Binghe Web3.0 Lab (@MRJAVAZHAO ), f14gme3f52prtyzk6pblogrdd6b6ivp4swc6qmesi, GCR
  9. Speedium, (@cryptowhizzard), f1krmypm4uoxxf3g7okrwtrahlmpcph3y7rbqqgfa, Europe
  10. Origin Storage, (@Tom-OriginStorage), f1q6bpjlqia6iemqbrdaxr2uehrhpvoju3qh4lpga, NA
  11. PiKNiK, (@Kevin-PiKNiK), f1ypuqpi4xn5q7zi5at3rmdltosozifhqmrt66vhq, NA
fireflyHZ commented 2 years ago

We have talked with the client of the Beacon Projecet, willing to support this one.

cryptowhizzard commented 2 years ago

Hello and good morning.

This application should not go under regular LDN.

LDN applications need to be stored and distributed on different continents. This one is China only and this is not the spirit of LDN. LDN application need to have public data, not encrypted data. LDN applications need to have their data public retrievable for the duration of the deals for everyone. ( Also outside China ).

I recommend you take this to the Fil Enterprise stage and put this under the special Fil Enterprise program.

MetaWaveInfo commented 2 years ago

We are very interested in the storage of private data, which will be the trend and future in the filecoin network. We'd like to be added to the list of notaries willing to support. thank you!

Destore2023 commented 2 years ago

Thanks for joining us @MetaWaveInfo! As the Lead Notary of the Beacon project, We welcome more notaries to join the support with us. Thanks to @cryptowhizzard for the tip, it's true that this is not a regular LDN application.

We know very well how difficult it is for the traditional clients to store in filecoin, but as an important ecological partner of the community, We think it is necessary for ByteBase to make corresponding contributions to Mr. Juan Benet's vision of "5PiB DC sealed everyday" and the last community meeting's goal of completing "200PiB DC sealed & 175 active LDNs" by July 26. Therefore, according to PL and ByteBase's suggestion, the client first submitted a public proposal. Acctually, ByteBase have done several rounds of communication with Keren (keren@protocol.ai), Frank, Stefaan, Deep, Galen and others from PL & Filecoin Foundation before the submission, and also have helped PL&FF to arrange the specific video conferences with the client.

According to @cryptowhizzard 's understanding of the LDN rules, I would like to make some clarifications: 1.LDN does not stipulate that the Client must distribute data to different continents, but needs to spread out the allocation to different nodes as many as possible. 2.Yes, the regular LDN requires public dataset. So for private data, we need to follow the template of the Antarctica Project  https://github.com/filecoin-project/notary-governance/issues/489 and submit the proposal in the GOVERNANCE channel first, which is the suggestion from PL and you may not aware of it. 3.As in the second point, there is no mandatory requirement for private data to be retrieved by anyone.

Regarding your suggestions to Fil Enterprise program, we are already following and strongly supporting the Fil+ E project with Meg. Based on the stage and target timeframe in her previous proposal, this new process maybe expected to be officially released in September to October, which maynot meet the needs of our client. Therefore, we will assist our clients with the datacap application process for the Beacon project as recommended by PL.

At present, we are also helping clients find more Notaries and SPs to join, the implementation of such project requires not only datacap and technical support, but also many SPs with sufficient fil. Thanks all!

cryptowhizzard commented 2 years ago

Hello @swatchliu

Don’t get me wrong here, i am all in favour of evolving the network as long as it is beneficial for everyone. I am also in favour of a separate ruleset for special locations as there are cultural differences at play.

What i do think is wrong here is that there is no community oversight. If you don’t make this data retrievable (outside) China there is no way to check for us if what you store is valid and that is the foundation of the LDN. This oversight was to be build in in Fil-E.

I can submit a 25000 EiB datacap request with private and encrypted data this way .. no one can check.

Can you make suggestions on how to make oversight on this project, how to verifiy that the data is legit and how we can keep thrust as community that you guys are doing the right thing here?

dkkapur commented 2 years ago

@cryptowhizzard for what its worth, this path + what Antarctic went through are what is E-Fil+ until we come up with something better. This is currently the way to request DataCap for these scenarios.

I can submit a 25000 EiB datacap request with private and encrypted data this way .. no one can check.

Reasonable and correct. My take on this (cc @galen-mcandrew since he originally mentioned this in a prior conversation) is that we typically get (1) info on the client, (2) sample of the data itself, ideally also shown post retrieval / some link to prove its the right data being stores, and (3) info on the SPs storing this data. (3) combined with the CIDs stored on the network tends to be the biggest variable in insuring the right things are happening on the network. It is rare to get (1) + (2) as well, especially in cases of enterprise clients. Getting (3) + either (1) or (2) seems to meet the burden of proof, at least given the tools we have access to today.

@swatchliu given that deals cannot be distributed outside, I do think it would be good to still get some notary approval from outside the region. Perhaps by having conversations with them and/or sharing a sample of the data? Antarctic had at least 3 regions covered across the selected notaries.

It could also be interesting to share some details on timelines and if all the deals will be progressing in parallel. We can also set up CID tracking to see how files are replicated across deals with the different SP IDs to prove replication?

Beacon-Edu commented 2 years ago

Thanks @dkkapur for your suggestions, we will try to get support from at least three regional notaries and more SPs. In fact, it is not easy to find SPs with so many fil pledges in reserve, and most of them stopped negotiating because of pledging issues.

Therefore, we plan to proceed in phases, starting the first phase 1/12-4/12, and launching the second phase 5/12-8/12 when there are enough different SPs to join, followed by the third phase 9/12-12/12 within this year.

We plan to select 5-10 different SPs per each phase, with a total of 15-30 SPs selected for all three phases, and the number of SPs will only increase.

For our KYC can be carried out simultaneously, we will complete all the preparations with PL and the related notaries supporting the Beacon Project, many thanks to ByteBase for coordinating and providing services for us.

Hope it will be a pleasant experience!

PluskitOfficial commented 2 years ago

The client has contacted us before and we have learned the details. We are willing to support the project.

During the two years we've been involved with the Filecoin project, we've seen very little storage in education, not to mention such a large volume. We were very excited when @Beacon-Edu spoke about the project.

We, of course, are well aware of the importance of verifing clients and avoid self dealing. So we have confirmed a lot of detailed information. For example, we have some knowledge about the client by searching network information and asking our friends, and we have clarified the credibility of the client.

We also raised doubts about their storage plan. However, we're well aware of the particularity of educational data and the client has previously expressed that they will show data samples through meetings with PL (the proposal shows that it will indeed do so). Therefore, we currently believe that such problem has also been solved.

In a word, as a new notary, we have tried our best to verify the first few LDN projects. We do think it's a good project and we are willing to support it.

liyunzhi-666 commented 2 years ago

It seems to be fine, I would like to support it, but need to know more about it. @Beacon-Edu ben

kernelogic commented 2 years ago

I'd like to support this initiative as a North American notary. We need to have more diversity of such LDNs other than just Antarctica Project https://github.com/filecoin-project/notary-governance/issues/489 and should be considered equally.

cryptowhizzard commented 2 years ago

Can i ask who is building this dataset for you @Beacon-Edu and what tools are used for that? How do you intend to distribute the data ( Carfiles ? )

Fenbushi-Filecoin commented 2 years ago

The client has contacted us and we would like to support the project. We just need some notaries from NA and Europe to take a look at the application and support it.

Tom-OriginStorage commented 2 years ago

We would like more justification to see how storing these large amount of data will benefit the Filecoin ecosystem. And also if possible, we would like to see proof that 11.8PiB of data is just a fraction of the archive (such as other datasets the company has and not planning to be stored on Filecoin yet). With the current description of the data, it does feels like these 11.8PiB is all the data the company has opposed to a portion of the archive.

Beacon-Edu commented 2 years ago

Thank you all for the great support, it appears that we have gotten support from 4 regions.

First of all, allow me to express my welcome to @cryptowhizzard @Tom-OriginStorage for giving us thoughtful tips and sharing experiences.

We plan to select 5-10 different SPs per phase, with a total of 15-30 SPs selected for all three phases, and the number of SPs will only increase.

@cryptowhizzard the encryption tools and the process will be handled exclusively by our technical department, pardon me if I can't be very specific. The distribution to SP is planned as above.

@Tom-OriginStorage yes, you can see 11.8PiB as our total data set. Since we will have 5 replicas stored, this is exactly the reason why we are applying for 60PiB.

Again, we appreciate all your comments and suggestions! We will prepare carefully with the implementation team before launching the project.

Tom-OriginStorage commented 2 years ago

Hmm, if a company is just dumping their entire video archive on Filecoin (encrypted + only in China), I don't really see the reason to provide them with datacap. The only beneficiaries here are the SPs, notaries (since they referred the SPs) and the company itself.

Filecoin ecosystem doesn't benefit at all from supporting Project Beacon under the current LDN program. While I understand that Filecoin wants to reach its KPI for data sealed and the client has a deadline to adhere, I really still think that the company should either re-apply under Fil-Enterprise or just go along storing the data without Datacap.

My reasoning is this, I can easily ask a company like Youtube or Bytedance to archive their data on Filecoin, a company like this can easily maxed out the network capacity with just a fraction of their data. And if Filecoin network were to actually actively subsidize companies to unload all of their archived data, I think there will definitely be economic repercussions.

cryptowhizzard commented 2 years ago

@Tom-OriginStorage there is a clear benefit when regions with different cultural "things" can participate and lock up collatoral instead of feeling not welcome in the ecosystem and sell everything they have.

If there is a clear path for KYC and a clear path for oversight then i am in favor of having participation.

What i would like to know is how the oversight is done and how the packing is done. @Beacon-Edu , you are using public LDN value from the community for your benefit. Your technical department should realise that they need to open up and give a description on how your process works on packing this amounts of data and what software tooling you use and how you distribute it so we can learn from it and do this together as a community.

I.e. if you develop on Linux ( Open source like filecoin ) then you can't commercialise that and make it closed source.

Destore2023 commented 2 years ago

We are very willing to see and actively support more notaries can invite giant companies like Youtube and Bytedance to join FIL+ to expand the influence of filecoin.

UnionLabs2020 commented 2 years ago

We should allow different voices to discuss, just like the Antarctic Project , but I still tend to support this proposal, because storing non-public dataset must be the Real Goal of Filecoin. We have to take this step forward. Believing that all community members who really love Filecoin will understand this.

If nothing changes, means nothing will remain.

Tom-OriginStorage commented 2 years ago

I need to clarify my stance as I see most people here are misunderstanding what I mentioned.

I have nothing against the company/client storing on Filecoin at all, I am against using the current LDN for the company/client. We already have Fil Enterprise coming up, if the client/company is able to wait for it to come live, it should. Unless the company/client has compelling reasons that they absolutely must store the data now which they don't seem to have.

As for inviting large entities, while I do know the right people in those companies to make it happen, I don't see a point in doing it at this stage. Oppose to riding the hype using big names (and not having the infrastructure to back it), I believe it is a more natural course to take by building on the ecosystem first, and gaining traction gradually, which is in fact what I am doing (building on the ecosystem).

MRJAVAZHAO commented 2 years ago

This is a good experiment. Binghe is willing to participate in this program if needed.

galen-mcandrew commented 2 years ago

We've discussed this in a governance call, and generally it is getting support. The biggest concern so far has to do with more regional notaries adding support. From tracking the comments, it seems like the list of notaries interested in performing the diligence for this client is below:

  1. ByteBase, f01105814 (Lead Notary), GCR
  2. Fenbushi, f012939, GCR
  3. Firefly, f01818099, Asia
  4. Pluskit, f01846444, Asia
  5. Metawave, (@MetaWaveInfo ), f01840229, Oceania
  6. Tianji Studio (@liyunzhi-666 ), f01841196, GCR
  7. Kernelogic (@kernelogic ), f01795034, NA
  8. Binghe Web3.0 Lab (@MRJAVAZHAO ), f01103162, GCR

Can these 8 notaries please give a reaction emoji to signal the above is correct and show support?

Additionally It is unclear if there are more notaries interested in joining this proof of concept proposal. @Tom-OriginStorage & @cryptowhizzard I see some great questions above. Would you like to be added to this group of notaries to perform diligence and approve DataCap requests? Please reply here to let us know!

galen-mcandrew commented 2 years ago

Additionally, I have a somewhat modified proposal for the implementation of this proof of concept. The Fil+ community is working to reduce the operational overhead, increase efficiency, and remove barriers where possible. Additionally, our goal is to learn from the incremental successes of these different proposals. Rather than exactly replicate Project Antarctic, I think there are already some lessons that we could incorporate here.

Specifically, I propose:

Advantages:

Would like to hear from @swatchliu & @Beacon-Edu about this idea, since it will change the LDN's that get created. If we move forward with this plan, then I think we should also update the parent comment in this proposal with the following details:

Attaching a sample diagram of the proposed structure. Untitled - Frame 1

Aaronn85 commented 2 years ago

@Beacon-Edu hello, I am the owner of node f01830428. I am very interested in your project and I have sufficient fil to pledge, in addition I have nodes in both mainland and Hong Kong. Do you have any spots left? Please contact me through slack if you do.

Tom-OriginStorage commented 2 years ago

We've discussed this in a governance call, and generally it is getting support. The biggest concern so far has to do with more regional notaries adding support. From tracking the comments, it seems like the list of notaries interested in performing the diligence for this client is below:

  1. ByteBase, f01105814 (Lead Notary), GCR
  2. Fenbushi, f012939, GCR
  3. Firefly, f01818099, Asia
  4. Pluskits, f01846444, Asia
  5. Metawave, (@MetaWaveInfo ), f01840229, Oceania
  6. Tianji Studio (@liyunzhi-666 ), f01841196, GCR
  7. Kernelogic (@kernelogic), f01795034, NA
  8. Binghe Web3.0 Lab (@MRJAVAZHAO ), f01103162, GCR

Can these 8 notaries please give a reaction emoji to signal the above is correct and show support?

Additionally It is unclear if there are more notaries interested in joining this proof of concept proposal. @Tom-OriginStorage & @cryptowhizzard I see some great questions above. Would you like to be added to this group of notaries to perform diligence and approve DataCap requests? Please reply here to let us know!

Yes, we are willing to be part of the notaries to perform diligence and approve DataCap requests. But we definitely still need to hear back from @Beacon-Edu on their explanations for pushing as Project-Beacon opposed to waiting for Fil-Enterprise.

Kevin-PiKNiK commented 2 years ago

@Beacon-Edu : Our team is a North American notary always willing to support opportunities like these, where possible. However, can you help me understand the composition of this private data?

You claim this is "3,600 videos of lessons recordings". At 11.8PiB, that means each video is greater than 3TiB. That size is bordering unbelievable for an educational video.

Here in the United States, it is very common for our universities to record educational courses. At my own alma mater, Harvard University recorded nearly every lecture. The average video was 100-250MiB, which was roughly 1 hour long at 480P and 15 frames per second. Why are your videos more than 12000x bigger? Even 4K videos at one hour do not reach the multi-TiB scale.

Again, we are happy to support, but we need to first understand the need to store educational videos that are each greater than 3x10^6MiB.

cryptowhizzard commented 2 years ago

We've discussed this in a governance call, and generally it is getting support. The biggest concern so far has to do with more regional notaries adding support. From tracking the comments, it seems like the list of notaries interested in performing the diligence for this client is below:

  1. ByteBase, f01105814 (Lead Notary), GCR
  2. Fenbushi, f012939, GCR
  3. Firefly, f01818099, Asia
  4. Pluskits, f01846444, Asia
  5. Metawave, (@MetaWaveInfo ), f01840229, Oceania
  6. Tianji Studio (@liyunzhi-666 ), f01841196, GCR
  7. Kernelogic (@kernelogic), f01795034, NA
  8. Binghe Web3.0 Lab (@MRJAVAZHAO ), f01103162, GCR

Can these 8 notaries please give a reaction emoji to signal the above is correct and show support? Additionally It is unclear if there are more notaries interested in joining this proof of concept proposal. @Tom-OriginStorage & @cryptowhizzard I see some great questions above. Would you like to be added to this group of notaries to perform diligence and approve DataCap requests? Please reply here to let us know!

Yes, we are willing to be part of the notaries to perform diligence and approve DataCap requests. But we definitely still need to hear back from @Beacon-Edu on their explanations for pushing as Project-Beacon opposed to waiting for Fil-Enterprise.

Ack. Same here.

Destore2023 commented 2 years ago

Hi @Kevin-PiKNiK, Thanks for the point. You might have a misunderstanding regarding the dataset breakdown.

Please take a look at the sheet 2 columns 4, "Course Live Recording", Let's do easy math, 10742/7000/50/32*1024=0.98GiB

Besides, as per the client is at the top level of the education group in China, each classroom equipt a 4MP network camera, H.265+ at 720P and 30f/S. It's a very reasonable size for the client.

Beacon-Edu commented 2 years ago

Thanks @galen-mcandrew for the feedback, apparently we have collected support from more than 5 notaries across 4 regions so far. Whether other notaries would like to join the support or not, we appreciate all the sharings and suggestions.

Besides, there are indeed many SPs like @Aaronn85 who contacted me on slack hoping to join Beacon project, we will consider it carefully and arrange further communication.

Thanks to the great advice and guidance from ByteBase during all this time, as per Galen's suggestion, we would like to use the single multisig notary entity with all signers(same as f01858410) and we strongly intend to launch 12 applications, each with a unique client address and each client address is allowed to be sent to one or several nodes under one SP.

Especially, such a huge amount of pledge requirements can be very challenging for many SPs, and may involve the participation and rotation of lots of SPs.

Of course we will keep the progress updated via Google sheet (https://docs.google.com/spreadsheets/d/18XDQkjlmWJ_BnQ-ygGszg8tFIC_36LHtOBQytHpRGCA/edit#gid=0) to ensure the transparency to the community.

In short, we will operate these applications smoothly, and we are willing to involve more SPs, Fil borrowers, and technical supporters joining this ecosystem.

jhookersyd commented 2 years ago

Hi @swatchliu,

To help everyone regarding the data size of this proposal, could you give me these facts? I ran a digital agency for 10 years so I can help frame this better for you.

How many hours of live raw footage do you want to store on Filecoin? How many hours of post-production content/video you actually play on screens do you want to store on Filecoin?

I need these metrics for raw footage and delivered content.

Video

Audio

Thanks! Jonathan

Destore2023 commented 2 years ago

Thanks, @jhookersyd for your question, for us it is also a good opportunity to understand the video-related field. After talking with the technical department of our client, we have summarized the data we learned in the following table.

Before we present the data, please refer to chart 2 in the proposal. Since the first three categories differ in terms of camera equipment (including smartphones) and specifications, we will not go into detail here.

For course live recording, please refer to the table below for the number of cameras, scales, and the corresponding data. Because our client requires all data to be maintained for 30 days, if we have to convert the data that our client will store in Filecoin to hours accurately, it will be a total of about 55 hours for the 51,256 cameras in their schools, which is actually far less than 1/10 of the data that our customer archived. image

For parameters related to classroom photography equipment please refer to the following links, I believe you can find all the metrics there, and if you have any questions please feel free to contact me!

Corridor Camera: https://www.hikvision.com/en/products/IP-Products/Network-Cameras/Pro-Series-EasyIP-/ds-2cd2t87g2p-lsu-sl/ Classroom Camera: https://www.hikvision.com/en/products/IP-Products/Network-Cameras/DeepinView-Series/ids-2cd7186g0-izs/ Playground & Parking Camera: https://www.hikvision.com/en/products/IP-Products/PTZ-Cameras/Ultra-Series/ds-2dp1618zixs-de-440-f0--p4-/

galen-mcandrew commented 2 years ago

@Beacon-Edu I see the 12 open LDN issues, and this spreadsheet, which is helpful. I have some requests to help track this though.

Of course we will keep the progress updated via Google sheet (https://docs.google.com/spreadsheets/d/18XDQkjlmWJ_BnQ-ygGszg8tFIC_36LHtOBQytHpRGCA/edit#gid=0) to ensure the transparency to the community.

LDN Issue Storage Provider SP ID minerID Client ID
444 Wuji Blockchain f0217419 f0431476 f1662iakcjszdl2shygg4ce2eb2bv2a3bhlii6gpa
galen-mcandrew commented 2 years ago

Starting this issue for creating the multisig and awarding the DataCap from root key holders. https://github.com/filecoin-project/notary-governance/issues/576

jhookersyd commented 2 years ago

It's amazing at the top of this application it's all about children's education...selling the dream. But when you get down to it, it's really video footage of car parks and corridors.

I think we can drop “humanity's most important information” on this project.

I'm happy for this to go forward if we're ok for Filecoin to be the security footage backup platform of the world.

Kevin-PiKNiK commented 2 years ago

Wait, your most recent response to @jhookersyd is unfair, @swatchliu . @Beacon-Edu originally said these "data sets are the output of our online and offline educational materials from the past 10 years, including courses, documents, and classroom recordings videos."

Now, you're saying that this is just security footage backup for the benefit of the data owner and the SPs who store this data, who will earn block rewards at the economic cost of everyone else who is part of the Filecoin island economy. This is all encrypted and useless to everybody else, whilst conferring tremendous financial upside to those who are participating in this scheme.

The inconsistency in this LDN is untenable. If FIL+ is just a mining game to participants in this ecosystem, then every SP should also load up their capacity with junk data and continue to make a fool of the Filecoin network.

jhookersyd commented 2 years ago

@galen-mcandrew @dkkapur @jnthnvctr

Concidering the new information about this project… Security footage not education.

As a community should we chat about it? How about next weeks Notary call, 16:00 PT?

Happy for you/FF to hit go, but you will be setting a big precedent here.

UnionLabs2020 commented 2 years ago

Please give necessary respect when commenting if some people do not understand the education situation in China. Please add me to the list of supporting notaries. thank you.

dkkapur commented 2 years ago

@jhookersyd thanks for the additional digging here. always great to see someone using their expertise in a particular field to bring additional value to the ecosystem.

Going to add some opinions here - would love to hear each of your thoughts.

Proposing that we get a little bit more information from notaries that did decide to support this - can you share more information on what the potential use cases of this data are? Thank you - @swatchliu @Fenbushi-Filecoin @fireflyHZ @PluskitOfficial @MetaWaveInfo @liyunzhi-666 @kernelogic @MRJAVAZHAO @UnionLabs2020!

@Beacon-Edu - as an applicant for DataCap, I do agree that the burden is on you to prove your trustworthiness and the value of your particular use case. Being as transparent as possible is key in earning trust with the community.

Really important to have this kind of discourse as we continue to grow the scope of the Fil+ program and ensure that Filecoin continues to deliver value. Thanks for engaging with these topics.

jhookersyd commented 2 years ago

Hi @dkkapur,

Clearly, the current notary governance structure isn't capable of scaling the onboarding of private encrypted enterprise data. We all know this.

I agree with your point 1, 2, 4, 8, and 11.

Point 3 - The disagreement is on 2 subjects:-

  1. The usefulness of the data, and
  2. The misleading description of the application

Point 5/6/7 - "Building confidence in specific projects/datasets/clients." Completely agree! If we crack this, we get to be the world's biggest data storage network.

My thoughts on how we fix this notary governance problem for private encrypted data... It's clear to see:-

  1. Notaries should be paid for NDA/KYC/Checking Data
  2. Notaries should stake FIL like SPs to check thoroughly/have skin in the game

For projects above 1PiB, I would:-

  1. Choose 6 notaries at random from a global pool, 4 of the 6 notaries need to sign off the project for the Data cap allowance to be approved
  2. Notaries can't know each other at any time during the sector lifecycle
  3. Give Notaries 1% of the project's FIL block reward. 5PiB raw data/5 Copies = 1% of 25PiB block reward
  4. Make Notaries stake 3% of 25PiB in FIL to have skin in the game
  5. If one notary finds fraudulent data, the 9% FIL staked from the other three Notaries goes to that exceptional notary
  6. 100% of the lead SPs sectors are terminated if they onboarded fraudulent data

Thoughts on this?

Point 9 - "We have notaries from many regions" How are we tracking that notaries are from the region they actually say they are?

Point 10 - "in more controversial cases like this", That would be great! I would love to hear from the 9 notaries on why security footage of a car park is "Useful".

Point 12 - I built an education company in Hong Kong and Shenzhen for 8 years; it has 1,500 primary schools across Asia. I understand the nuances of the Chinese education system deeply. @UnionLabs2020

If we get the governance structure for private data nailed, FIL will be at 1,000 USD very quickly.

Thank you all!

dkkapur commented 2 years ago

@jhookersyd thanks for addressing individual points. Responding to the set in order.

Point 3 - I agree with you on both. Thanks for correcting.

Points 5/6/7 - Also generally agree with this sentiment, but:

Your proposal on LDNs for enterprise use cases

Point 9 - "We have notaries from many regions" How are we tracking that notaries are from the region they actually say they are?

Point 10/12 - ACK. Let's see what folks have to share!

Tom-OriginStorage commented 2 years ago

It seems that there is a mismatch of what is promised to be stored vs what is actually stored. A clarification is definitely needed from the client, and an update should be made to the original post.

Personally, I feel that surveillance data has little usage but does not constitutes as useless data. Surveillance footage may capture some useful data and be used as evidence for crimes or to assist in investigation. And in this case, they are used to ensure the safety of the children after all. But outside of this use case, surveillance footage does serve very little purpose.

And back to the question of storing humanity's most important information. Quite a lot of important footage in humanity's history is captured by surveillance accidentally. But it is only a very very small portion of surveillance data that coincidentally became historical moments, and they were originally never meant to be captured. The way I see it is that surveillance data is very similar to insurance, they are used to insure against the unexpected and people do not spend a vast amount of their wealth to purchase insurance. Similarly, I believe that the vast majority of Filecoin's storage should not be dedicated to surveillance footage, but its acceptable for a small portion to be catered for that.

fireflyHZ commented 2 years ago

We have communicated thoroughly with the Beacon team, even though the client is currently pregnant, but she was still very active in the KYC process. we believe that the client and the requirements are genuine and trustworthy, we are in support of the project.

Beacon-Edu commented 2 years ago

First of all, I would like to make it clear that fixed filming equipment is not only made for security footage. All of you here are either Filecoin operators or participants, and although we are in different industries, please respect the work we love and dedicate ourselves to.

It's true that I know nowhere near as much about Filecoin as you all do, but as Deep said the definition of valuable is nothing but subjective. While I think the educational videos of our Chinese children are very meaningful, even everything that happens to children around the classroom is valuable. But apparently for some people it is meaningless, and there's nothing i can do about that. I have no responsibility nor interest in getting into any arguments.

Everyone is entitled to their opinions, but we do not welcome questions that are blind and regardless of the facts. We are happy to arrange introductions of data samples under confidentiality agreement, so please feel free to contact us on slack(Beacon-Edu)!

PluskitOfficial commented 2 years ago

We strongly support this project! As a newly joined notary, we highly concern about the authenticity of the project and the data. Not only did us verify the data provided by BEACON project, but we also conducted multiple confirmations through internet searches, national corporate qualification verification and other platforms. Yes, it is not common for a education groups to participate in Felicoin, but I believe that education is a valuable resource in any country and has the need to be stored. It is also inevitable and not hard to understand that data encryption is necessary to protect students and teachers. Besides, the project has already found many SPs and we believe it is credible for future storage plans. Last , our platform will also closely follow this project and synchronize their sealing status.

MetaWaveInfo commented 2 years ago

Sharing that we have checked the data samples and completed KYC with Beacon team, and then choose to support this project. But one thing we'd like to declare is that the determination of the value of private data is far beyond the ability of us. This has nothing to do with the qualifications and abilities of notaries. We can only ensure that this client is true, the dataset is real and the requirement is reasonable.

So, how about taking it as a pilot project first? This will be another great leap forward for filecoin.

InteresParte commented 2 years ago

No intention for any unfriendliness or fighting in anything asked below, simply looking for some clarification on details :)

The goal of preserving educational material is certainly good.

There seems to be conflicting information between the initial post and the information swatchliu is giving. No negative intention to swatchliu here, as being lead notary it is natural to want to aid the project, and it is understandable to make mistakes when providing information. It would be very helpful if clarity could be given on some of the conflict:



   1. The number of videos/files comprising the dataset.

Data: The proposal states that the dataset contains over 3600 videos of lesson recordings. The second image in the initial post shows 10742 TiB for 7000 classes, for 50 courses in 32 lessons

Conflict: Swatchliu provided some easy math for us, that 10742/7000/50/32 * 1024 = .98

This is a little confusing. While 10,742 TiB of data divided between 11,200,000 videos (7000 50 32) does equal about .98 GiB per video, the number of videos in the initial statement does not match the provided math (3,600 vs 11,200,000). If I’ve made the mistake here please correct it! :)

   2. The content of the videos.

Data: The proposal highlights the value of the dataset, that it contains courses, documentation, and lesson recordings. The answers following up the questions asked here seem to imply the vast majority of the data could be categorized as “surveillance footage” rather than “educational material”.

Note: Please do not use the term “parking lot videos” as a generalization describing the full dataset. This term sounds derogatory, and I would like to believe the bulk of the video is legitimate classroom footage.

Conflict: Classroom footage does not necessarily mean that it is educational. Please consider the difference between “educational videos” and “videos of children in a classroom”. Educational videos implies that the focus and purpose of the recorded video is the presentation of the material, while the second is monitoring and accountability within the classroom.

If 10742 TiB of the 11938 TiB total is truly surveillance video of classrooms and not educational courses, it should be represented properly in the proposal as such and not “Course Live Recording”.

I do struggle to believe that this amount of recorded footage is valuable as an educational resource and is not predominantly used as security footage, but validating this is the job of the notary and tied to their own reputation!

jhookersyd commented 2 years ago

For clarification. I used this data point and asked a factual question. I did not generalise educational videos as car park videos. Screen Shot 2022-07-23 at 8 39 36 am

Agree with @InteresParte. Lots of data points don't line up here; in short as a community we need more information on the finer details of the video data.

For the next steps... it would be great to have a call to clarify the exact details of this significant project as a community. I think some details are getting lost on GitHub. It would be easier to talk.

MatrixStorage commented 2 years ago

We have verified the authenticity of their data through online meetings and we are happy to support this project.

As a platform for storing data, i think it is paramount for Filecoin to provide the storage services needed by genuine clients. Especially for industries that are less common in web 3.0, it is necessary to build their trust better, to gradually expand the influence and reputation, and eventually to make Filecoin a more inclusive community.

I believe this is also in line with the initial intention of many participants who have always been committed to the ecosystem.

Destore2023 commented 2 years ago

Thank all participants in the community for their strong support to Beacon Project, especially @galen-mcandrew was still talking with us about Beacon's detailed questions in the dead of night. As the lead notary, although we are exhausted in these days to handle with the Client and many SPs, we still feel it is worth it.

@InteresParte, frankly speaking, it's a little difficult for me to understand the need for you to urgently reregister an NEW account to repeat your views. Well ,I try to understand your questions as a kind doubt and clarify once again:

1、For the number of videos and files a. The dataset includes but is not limited to at least more than 3600 course videos, which are course materials and are constantly growing.   b. The data set of about 11.6 Pib (11938.74 TiB) is only a small fraction of the Beacon project. If the 11.6Pib is converted into data for all the approximately 51,000 cameras (most of which are located in the core teaching area), you can use common sense to calculate how long it can be stored. We think it is very reasonable to judge the data scale of the 11.8P/60P application from this perspective. BTW, the reason we demonstrated the converting method before was to better illustrate the magnitude of the data, to help people to understand it. c. It is normal that the content of class videos certainly occupies a large proportion and clearly exceeds the content of course materials and documents.

We don't see any conflict between a, b, c. If someone deliberately takes the overall volume of video within the data and divides it by the number of course materials to reach your hypothetical conclusion, then it is indeed difficult for us to understand your real intention and continue to have a friendly interaction.

2. For the video content I'd like to reiterate that this is about a Non Public data storage project, not an usual public dataset. Please don't use your individual perceptions to arbitrarily judge the value of other people's data content. This is also the reason why our Client needs to collect the support from more than 5 notaries.

@InteresParte, no matter which notary you represent or which famous university you graduated from, I hope you can seriously study and understand the community rules. Openness and transparency are our behavioral requirements. Maybe one day we can meet and talk about the future of filecoin.

sjcdltx commented 2 years ago

Sorry, late to the conversation (just had a baby). Simon from DLTx here (new to GitHub). Looks like we have a few conflicting descriptions of the content in this thread, the first post is around education and lower down we see the video content has corridors/car parks footage in it. As a network we need to operate with maximum integrity and transparency at all times. Would be great to have a community call about this project as @jhookersyn said.

If nothing else we should establish types of data that we are happy with being FIL+. I have many hotels in my connections with security footage (corridors, car parks, etc). I also have EdTech contacts with video courses who could be customers. If we are ok with both types, I will bring them all on. If only one type then fine too. Thanks.

Destore2023 commented 2 years ago

image

screenshot here!