Hi I'm researching provenance license/consent risk for clients. The risk being managed is "risk of litigation requiring derivative works such as LLMs to be taken down as a result of copyright violation".
I can't immediately find any resources regarding dolma that address this. I can see some ways that it could be by only crawling content that has a clear statement of the content license (such as Creative Commons).
Hi I'm researching provenance license/consent risk for clients. The risk being managed is "risk of litigation requiring derivative works such as LLMs to be taken down as a result of copyright violation".
I can't immediately find any resources regarding dolma that address this. I can see some ways that it could be by only crawling content that has a clear statement of the content license (such as Creative Commons).
Apologies if this was made clear somewhere!
š in advanceā¦