Closed NotFenixio closed 6 months ago
i mean, i'm mostly working on Pyratch now, but this'd be really helpful for all alternative frontends, so i support this idea.
Awesome! We need a name... Any suggestions?
I'll just call it ScratchedDB for now,
I really like this idea, however, we'd need reliable hosting with the closest to 100% uptime we can get. I have an AWS account so we could try that, but it's really expensive so we'd need to get our money's worth out of it.
As for the name, we could call it Voyager. (Thanks ChatGPT)
We could also try writing it in Rust for funzies.
Voyager then! I'll rename the repo in a moment.
The idea is to create a locally-deployable ScratchDB, so whoever downloads Snazzle or any other alternative frontend, will be hosting its own ScratchDB.
Also, I discovered that AWS has a 12-month free tier for Amazon EC2 which we can use to deploy this new thing for 1 year. (A simple Glitch project with UptimeRobot could do the thing too)
And for the Rust thing, I don't know... Let's try doing it in Python and leaving that for the future Snazzle Svelte/Rust port.
And for the Rust thing, I don't know... Let's try doing it in Python and leaving that for the future Snazzle Svelte/Rust port.
Rust rewrite of Svelte??? /j \
Random suggestions: Use requests and beautiful soup since it uses way less ram (also there are rss feeds but they don’t have every post)
Have a centralized server to reduce load on scratch but clients have a local cache/scraper in case the server goes down
Clients can choose whether they want stale data immediately or updates data that takes longer to get
Clients can choose whether they want stale data immediately or updates data that takes longer to get
I think we could combine Voyager with a system on the client that checks if the RSS data has new posts that Voyager doesn't have yet, in which case it sends this data to the central Voyager server and then displays the new data to the user.
Clients can choose whether they want stale data immediately or updates data that takes longer to get
I think we could combine Voyager with a system on the client that checks if the RSS data has new posts that Voyager doesn't have yet, in which case it sends this data to the central Voyager server and then displays the new data to the user.
Its probably better to avoid sending requests to Scratch if we don't have to. What I meant was that if the client doesn't care about having the most up-to-date data it can tell the server that so the server doesn't have to make a request to the Scratch servers.
Clients can choose whether they want stale data immediately or updates data that takes longer to get
I think we could combine Voyager with a system on the client that checks if the RSS data has new posts that Voyager doesn't have yet, in which case it sends this data to the central Voyager server and then displays the new data to the user.
Its probably better to avoid sending requests to Scratch if we don't have to. What I meant was that if the client doesn't care about having the most up-to-date data it can tell the server that so the server doesn't have to make a request to the Scratch servers.
To that end, we should also add rate-limiting (maybe only 3 requests a second?) to avoid stressing the Scratch servers. We may need to increase this number based on website traffic, though. Ideally the server should do this automatically somehow.
We don't need any browser automation tools, Scratch forums are easy to fetch over HTTP requests
If we are using Rust, we can use the reqwest and scraper crates for data and serve it over actix. I can work on it whenever I have free time
To that end, we should also add rate-limiting (maybe only 3 requests a second?) to avoid stressing the Scratch servers. We may need to increase this number based on website traffic, though. Ideally the server should do this automatically somehow.
I think 3 requests/second is fine
We're building it on Python, but we need some help with specific functions that require indexing Scratch. https://github.com/users/NotFenixio/projects/3/views/1
I’m working on a scraper right now that uses SQLite
Voyager then! I'll rename the repo in a moment.
The idea is to create a locally-deployable ScratchDB, so whoever downloads Snazzle or any other alternative frontend, will be hosting its own ScratchDB.
Also, I discovered that AWS has a 12-month free tier for Amazon EC2 which we can use to deploy this new thing for 1 year. (A simple Glitch project with UptimeRobot could do the thing too)
And for the Rust thing, I don't know... Let's try doing it in Python and leaving that for the future Snazzle Svelte/Rust port.
Depending on how much load there is I will probably be able to host it
I’m working on a scraper right now that uses SQLite
Since Voyager is already being made by @NotFenixio, I had an idea.
When you both get your ideas usable in Snazzle, we can vote on the better one and we’ll use that. I might create my own entry as well.
Depending on how much load there is I will probably be able to host it
The idea is to create a more reliable service, so we should use the cloud for maximum uptime.
I’m working on a scraper right now that uses SQLite
Since Voyager is already being made by @NotFenixio, I had an idea.
When you both get your ideas usable in Snazzle, we can vote on the better one and we’ll use that. I might create my own entry as well.
Depending on how much load there is I will probably be able to host it
The idea is to create a more reliable service, so we should use the cloud for maximum uptime.
I’m going to initially run mine on my pi but if you find a free cloud service I can switch it
- Only retrieves 25 posts compared to ScratchDB's 50 posts.
Why only 25? Also, you should maybe announce these things in Voyager's repo.
I'll make an announcement in the Scratch forum thread saying that all Voyager-related concerns should be funneled to the Voyager repo.
Also, we could probably rename it to Voyageur to avoid being confused with the—frankly fascinating—space probe.
Why only 25?
Scratch only shows the first 25 topics per page. I'm working of improving at the broken-more-topics
branch in the Voyager repo.
Also, you should maybe announce these things in Voyager's repo.
Yeah, I just wanted y'all to get updates.
Also, we could probably rename it to Voyageur to avoid being confused with the—frankly fascinating—space probe.
I think its better to maintain that name. Another name change could probably break more things.
Voyager should be kept as a different project, developed simultaneously with Snazzle if you are planning to make it as an alternative to ScratchDB
It should be put into two parts, Pioneer, the scraper and Horizons, the DB.
Pioneer and Horizons would work together to form Voyager. Pioneer's sole purpose would be to keep scraping and Horizons' sole purpose would be store the data scraped by Pioneer.
Instead of being built in Python, it should be built with Golang's Colly scraper as it is scalable, efficient, fast, parallely computed and has several built-in functions. The DB would be C++ based ScyllaDB if data is stored locally, or Google Cloud if the applications should be based on the cloud.
Also, should the Voyager system be ran locally, there should be at least 8GB RAM per node and 2TB of High-Speed Storage per node. The ideal candidate for making a local server would be 4 Raspberry Pi Compute Module 4's specced at 8GB of RAM and no eMMC. The carrier board for it would be the Turing Pi 2.5. Each node would have 2TB NVMe storage. The overall cost for it would be about 1000$. This would be easily able to handle every forum post ever created and every forum post created in the future for about 5 years.
The overall cost for it would be about 1000$.
That might be a bit of a problem.
Replying to dynamixbot...
I mean it's not a bad idea but we don't have such money. Why would we buy a 300$ carrier for such small project? A 3D printed case is like 15$. Also, the technology is not ideal. I don't think anyone here knows Go.
Voyager should be kept as a different project, developed simultaneously with Snazzle if you are planning to make it as an alternative to ScratchDB
Voyager is a separate project.
I know I said on the snazzle topic that voyager would be canceled but if we implement it the way @dynamixbot said, it would be exactly the same as what I was trying to do :P so voyager is un-canceled now
I'll run it on my Pi 4B once I get it upgraded. I'm gonna get a super fast, high capacity SSD for it and a better fan so it can't overheat. I might also get a Pi 5 to run it in a cluster with and have them be able to access the same storage but that is incredibly tentative at the moment.
Also, to address the more alarming thing: I apologize for ending the project (it's not now lol) without any warning to team members. I should have asked you about it before making an executive decision and announcement. Going forward I will make contact with all of you before making any drastic decisions.
I mean it's not a bad idea but we don't have such money. Why would we buy a 300$ carrier for such small project? A 3D printed case is like 15$. Also, the technology is not ideal. I don't think anyone here knows Go.
I guess we could scale in the future when required. Also, I know GO and I can make a Pioneer prototype. Only Horizons needs to be handled with Google Cloud free tier.
I know I said on the snazzle topic that voyager would be canceled but if we implement it the way @dynamixbot said, it would be exactly the same as what I was trying to do :P so voyager is un-canceled now
Okay, so are we doing it or not? We can make a PCB which carries the Compute Module 4. We can then in the eventual future scale up when we start lagging. We should do this with Raspberry Pi only as it is convenient and cheap.
I mean it's not a bad idea but we don't have such money. Why would we buy a 300$ carrier for such small project? A 3D printed case is like 15$. Also, the technology is not ideal. I don't think anyone here knows Go.
I guess we could scale in the future when required. Also, I know GO and I can make a Pioneer prototype. Only Horizons needs to be handled with Google Cloud free tier.
I know I said on the snazzle topic that voyager would be canceled but if we implement it the way @dynamixbot said, it would be exactly the same as what I was trying to do :P so voyager is un-canceled now
Okay, so are we doing it or not? We can make a PCB which carries the Compute Module 4. We can then in the eventual future scale up when we start lagging. We should do this with Raspberry Pi only as it is convenient and cheap.
I can set up a raspberry pi as a server
The overall cost for it would be about 1000$.
That might be a bit of a problem.
I mean ScratchDB costs about 100$ every month to maintain (estimated figure)
I mean it's not a bad idea but we don't have such money. Why would we buy a 300$ carrier for such small project? A 3D printed case is like 15$. Also, the technology is not ideal. I don't think anyone here knows Go.
I guess we could scale in the future when required. Also, I know GO and I can make a Pioneer prototype. Only Horizons needs to be handled with Google Cloud free tier.
I know I said on the snazzle topic that voyager would be canceled but if we implement it the way @dynamixbot said, it would be exactly the same as what I was trying to do :P so voyager is un-canceled now
Okay, so are we doing it or not? We can make a PCB which carries the Compute Module 4. We can then in the eventual future scale up when we start lagging. We should do this with Raspberry Pi only as it is convenient and cheap.
I can set up a raspberry pi as a server
Do you have a Compute Module 4? That way, I can design a PCB in <1 month and the PCB would only cost about 5$. It would be suited to our needs.
The overall cost for it would be about 1000$.
That might be a bit of a problem.
I mean ScratchDB costs about 100$ every month to maintain (estimated figure)
What is that 100$ coming from? If it’s the cost of the hardware it’s realistic but that’s a one time cost. Internet costs could be that high (I have no idea how much day scratchdb serves)
I mean it's not a bad idea but we don't have such money. Why would we buy a 300$ carrier for such small project? A 3D printed case is like 15$. Also, the technology is not ideal. I don't think anyone here knows Go.
I guess we could scale in the future when required. Also, I know GO and I can make a Pioneer prototype. Only Horizons needs to be handled with Google Cloud free tier.
I know I said on the snazzle topic that voyager would be canceled but if we implement it the way @dynamixbot said, it would be exactly the same as what I was trying to do :P so voyager is un-canceled now
Okay, so are we doing it or not? We can make a PCB which carries the Compute Module 4. We can then in the eventual future scale up when we start lagging. We should do this with Raspberry Pi only as it is convenient and cheap.
I can set up a raspberry pi as a server
Do you have a Compute Module 4? That way, I can design a PCB in <1 month and the PCB would only cost about 5$. It would be suited to our needs.
I have a normal pi 4 which is the same but with more IO And how would you get the pcb to me
I mean it's not a bad idea but we don't have such money. Why would we buy a 300$ carrier for such small project? A 3D printed case is like 15$. Also, the technology is not ideal. I don't think anyone here knows Go.
I guess we could scale in the future when required. Also, I know GO and I can make a Pioneer prototype. Only Horizons needs to be handled with Google Cloud free tier.
I know I said on the snazzle topic that voyager would be canceled but if we implement it the way @dynamixbot said, it would be exactly the same as what I was trying to do :P so voyager is un-canceled now
Okay, so are we doing it or not? We can make a PCB which carries the Compute Module 4. We can then in the eventual future scale up when we start lagging. We should do this with Raspberry Pi only as it is convenient and cheap.
I can set up a raspberry pi as a server
Do you have a Compute Module 4? That way, I can design a PCB in <1 month and the PCB would only cost about 5$. It would be suited to our needs.
who even are you and why do you care about this? plus, we don't need a custom-made PCB for a small project when a Pi 4 would work just as well.
also, nobody seemed to mention that Voyager (iirc) is designed to be deployed locally. even if we were to have a public instance, i have a much better idea than spending $1k and $100 a month:
somebody uses a Pi that they already have, and they buy 1 or 2 usb hard drives. it's almost like it's an extremely obvious solution and doesn't require custom PCBs and shit, and would be very cheap.
I mean it's not a bad idea but we don't have such money. Why would we buy a 300$ carrier for such small project? A 3D printed case is like 15$. Also, the technology is not ideal. I don't think anyone here knows Go.
I guess we could scale in the future when required. Also, I know GO and I can make a Pioneer prototype. Only Horizons needs to be handled with Google Cloud free tier.
I know I said on the snazzle topic that voyager would be canceled but if we implement it the way @dynamixbot said, it would be exactly the same as what I was trying to do :P so voyager is un-canceled now
Okay, so are we doing it or not? We can make a PCB which carries the Compute Module 4. We can then in the eventual future scale up when we start lagging. We should do this with Raspberry Pi only as it is convenient and cheap.
I can set up a raspberry pi as a server
Do you have a Compute Module 4? That way, I can design a PCB in <1 month and the PCB would only cost about 5$. It would be suited to our needs.
who even are you and why do you care about this? plus, we don't need a custom-made PCB for a small project when a Pi 4 would work just as well.
I am dynamixbot, dynamicsofscratch on scratch and I care about this because ScratchDB is a hassle, Snazzle looks really cool and I have some cool ideas.
I mean it's not a bad idea but we don't have such money. Why would we buy a 300$ carrier for such small project? A 3D printed case is like 15$. Also, the technology is not ideal. I don't think anyone here knows Go.
I guess we could scale in the future when required. Also, I know GO and I can make a Pioneer prototype. Only Horizons needs to be handled with Google Cloud free tier.
I know I said on the snazzle topic that voyager would be canceled but if we implement it the way @dynamixbot said, it would be exactly the same as what I was trying to do :P so voyager is un-canceled now
Okay, so are we doing it or not? We can make a PCB which carries the Compute Module 4. We can then in the eventual future scale up when we start lagging. We should do this with Raspberry Pi only as it is convenient and cheap.
I can set up a raspberry pi as a server
Do you have a Compute Module 4? That way, I can design a PCB in <1 month and the PCB would only cost about 5$. It would be suited to our needs.
I have a normal pi 4 which is the same but with more IO And how would you get the pcb to me
I guess we can go with the default Pi 4
The overall cost for it would be about 1000$.
That might be a bit of a problem.
I mean ScratchDB costs about 100$ every month to maintain (estimated figure)
What is that 100$ coming from? If it’s the cost of the hardware it’s realistic but that’s a one time cost. Internet costs could be that high (I have no idea how much day scratchdb serves)
Internet costs and traffic costs.
The overall cost for it would be about 1000$.
That might be a bit of a problem.
I mean ScratchDB costs about 100$ every month to maintain (estimated figure)
What is that 100$ coming from? If it’s the cost of the hardware it’s realistic but that’s a one time cost. Internet costs could be that high (I have no idea how much day scratchdb serves)
Internet costs and traffic costs.
yeah, but it's not like any of us don't pay for internet already.
What if instead of scraping we just use the ScratchAPI? It is already documented by the wiki and can be used to everything that can be done already on Scratch. We just have to focus on getting the extra features we want to be ready.
What if instead of scraping we just use the ScratchAPI? It is already documented by the wiki and can be used to everything that can be done already on Scratch. We just have to focus on getting the extra features we want to be ready.
there's no forums API. that's the entire point of ScratchDB, and now Voyager.
What if instead of scraping we just use the ScratchAPI? It is already documented by the wiki and can be used to everything that can be done already on Scratch. We just have to focus on getting the extra features we want to be ready.
there's no forums API. that's the entire point of ScratchDB, and now Voyager.
Also, fetching and parsing forum posts on demand is likely less optimal and more taxing on Scratch's servers than if Voyager scraped the forums using a few indexers
What if instead of scraping we just use the ScratchAPI? It is already documented by the wiki and can be used to everything that can be done already on Scratch. We just have to focus on getting the extra features we want to be ready.
there's no forums API. that's the entire point of ScratchDB, and now Voyager.
Well at least we can use it for Snazzle's projects and profiles and players and stuff.
What if instead of scraping we just use the ScratchAPI? It is already documented by the wiki and can be used to everything that can be done already on Scratch. We just have to focus on getting the extra features we want to be ready.
there's no forums API. that's the entire point of ScratchDB, and now Voyager.
Also, fetching and parsing forum posts on demand is likely less optimal and more taxing on Scratch's servers than if Voyager scraped the forums using a few indexers
Hey also how many bots would we need to scrape the forums?
Like equal to how many users active on the forums?
What if instead of scraping we just use the ScratchAPI? It is already documented by the wiki and can be used to everything that can be done already on Scratch. We just have to focus on getting the extra features we want to be ready.
there's no forums API. that's the entire point of ScratchDB, and now Voyager.
Also, fetching and parsing forum posts on demand is likely less optimal and more taxing on Scratch's servers than if Voyager scraped the forums using a few indexers
Hey also how many bots would we need to scrape the forums?
Like equal to how many users active on the forums?
How fast do you want it to be? Also the forums aren’t session based so # of bots doesn’t really mean anything
Also, fetching and parsing forum posts on demand is likely less optimal and more taxing on Scratch's servers than if Voyager scraped the forums using a few indexers
Hey also how many bots would we need to scrape the forums? Like equal to how many users active on the forums?
How fast do you want it to be? Also the forums aren’t session based so # of bots doesn’t really mean anything
I don't think the # of bots refers to number of accounts being used, just the number of scraping processes running in parallel
Hey also how many bots would we need to scrape the forums? Like equal to how many users active on the forums?
How fast do you want it to be? Also the forums aren’t session based so # of bots doesn’t really mean anything
Oh okay.
I don't think the # of bots refers to number of accounts being used, just the number of scraping processes running in parallel
Well I meant opposite of what you don't think. I was thinking that instead of loading everything and downloading it on cloud, we would only download it if needed or requested to be loaded. And one a page is loaded, other people don't have to go through the slow first view of a forum.
I've deleted the Voyager repository available at my profile in favor of the new organization for Voyager, GetVoyager. The new Voyager version will be developed in the Voyager repository. By the way, should be have 3 separate repositories for Pioneer, Horizons, and the actual service?
I've deleted the Voyager repository available at my profile in favor of the new organization for Voyager, GetVoyager. The new Voyager version will be developed in the Voyager repository. By the way, should be have 3 separate repositories for Pioneer, Horizons, and the actual service?
Subdirectories would be better than opening a whole new repository.
need people for voyager
@redstone-dev need people for voyager
@redstone-dev need people for voyager
I think we could all work on Voyager and Snazzle at the same time, though I think you and @NotFenixio should decide on that, since you're basically the heads of the project.
LGTM.
This idea is very good, I'll support this
Recently, ScratchDB has been acting up, causing problems for Snazzle, which relies on it. But we can't just toss ScratchDB aside because it's our main source of info. So, here's an idea: Let's create our own ScratchDB.
To do this, we'd learn from ScratchDB's way of doing things. We can use PlayWright, Selenium, and/or Atoma to grab info from Scratch, and then BeautifulSoup to clean it up and get the data we need.
Now, this idea is kind of like a trial run, like taking a poll. I want to know what y'all think about it. Would this be a good move?