h0n24 commented 3 years ago

I wanted to give you a few tips, but yesterday in the class there was no time for that.

Overall it's a good code. I've checked if someone came with that idea before and haven't found anyone, so you have points for uniqueness and creativity! :)

From the mistakes that jump out at you at first sight; It's considered a bad habit to name your variables as "x". Or even a "msg" could be considered as such. In your case, for example; your part of code: "for x in msg:".

I bet you probably know already why it's a bad habit, so I guess no need to explain that. If not, it's easily googleable, just check on habits that lead to easy readability for other programmers. Sometimes I see programmers say "but hey I wrote the code just for myself", well as you can see I was reading it too. ;) Always think about the others that they can potentially read the code in the future. In the long run, it will help you as well, when you'll be returning to the code after a few months or years.

How to improve the code?

Well, there are multiple ideas that could be included. But overall, speaking about any algorithm, it's a great habit to just test it extensively. Test it in a real environment. And test it against other algorithms.

Speaking about testing it in the real environment, you've probably done that already on the Discord I believe. That's a good start.

What I would do in your case, I'd utilize the knowledge that you have from yesterday's lesson and just scrape any Twitter account posts. One that you can find funny is the "unfiltered developer's commit messages" at https://twitter.com/gitlost. But you can use nearly any Twitter account. Of course, there are many prepared Twitter datasets already, but hey - rather try to make your own dataset. As for now, you should have the ability for that so why not improving it so you'll maximize your skill on that? I bet you can do so! :)

If you want to rather test it in real-time, which is always more fun, I would try to utilize Twitch API. You can connect to any Twitch channel and you can be testing even thousands of messages in a minute.

Concerning the test against other algorithms, when you'll have the data from the real world prepared, you can simultaneously run your own filter with some other profanity filter API (a few can be found here: https://rapidapi.com/collection/profanity-filter) and compare their different outcomes. How many false positives they have? How fast they are? Everything can be compared. You can save the differences, create your own comparison data, and make a website where you're comparing your algorithm with others. I can promise you that "having real-world data comparison" will always convince investors. Having data to prove where your product is better is always a way next-gen approach than just saying "it is better". Proving it > saying it.

— John from the IT Step Academy

SIM7K commented 3 years ago

Hi, thanks for taking the time out of your day to write that, I appreciate it.

About the variable names: yeah, I should improve on those, it shows. Already had a few issues with python not working exactly how I intended, and had to redo the thing.

For testing: yeah, I already found multiple issues in the original design and the code that is now on github is the revised version, up to this date no other issues were found.

The ideas you had with twitter datasets were cool, I looked at the rapidapi algo and it just seems to be a massive in search for items in some database: better at detecting swears but massively taxing on the cpu cycles, still have to investigate it.

Thanks again for the feedback, Simir

Dne pá 25. 6. 2021 15:01 uživatel Jan Šablatura @.***> napsal:

I wanted to give you a few tips, but yesterday in the class there was no time for that.

Overall it's a good code. I've checked if someone came with that idea before and haven't found anyone, so you have points for uniqueness and creativity! :)

From the mistakes that jump out at you at first sight; It's considered a bad habit to name your variables as "x". Or even a "msg" could be considered as such. In your case, for example; your part of code: "for x in msg:".

I bet you probably know already why it's a bad habit, so I guess no need to explain that. If not, it's easily googleable, just check on habits that lead to easy readability for other programmers. Sometimes I see programmers say "but hey I wrote the code just for myself", well as you can see I was reading it too. ;) Always think about the others that they can potentially read the code in the future. In the long run, it will help you as well, when you'll be returning to the code after a few months or years. How to improve the code?

Well, there are multiple ideas that could be included. But overall, speaking about any algorithm, it's a great habit to just test it extensively. Test it in a real environment. And test it against other algorithms.

Speaking about testing it in the real environment, you've probably done that already on the Discord I believe. That's a good start.

What I would do in your case, I'd utilize the knowledge that you have from yesterday's lesson and just scrape any Twitter account posts. One that you can find funny is the "unfiltered developer's commit messages" at https://twitter.com/gitlost. But you can use nearly any Twitter account. Of course, there are many prepared Twitter datasets already, but hey - rather try to make your own dataset. As for now, you should have the ability for that so why not improving it so you'll maximize your skill on that? I bet you can do so! :)

If you want to rather test it in real-time, which is always more fun, I would try to utilize Twitch API. You can connect to any Twitch channel and you can be testing even thousands of messages in a minute.

Concerning the test against other algorithms, when you'll have the data from the real world prepared, you can simultaneously run your own filter with some other profanity filter API (a few can be found here: https://rapidapi.com/collection/profanity-filter) and compare their different outcomes. How many false positives they have? How fast they are? Everything can be compared. You can save the differences, create your own comparison data, and make a website where you're comparing your algorithm with others. I can promise you that "having real-world data comparison" will always convince investors. Having data to prove where your product is better is always a way next-gen approach than just saying "it is better". Proving it > saying it.

— John from the IT Step Academy

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/SIM7K/BotAlgos/issues/1, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOZSXBRLVDI6DNT473VRG63TUR435ANCNFSM47J24M5A .

SIM7K commented 3 years ago

Second thing: looked at the other apis and they all seem to be ai crutches afai can guess, so not great on that as well. Granted, they detect more stuff, but I think I can outspeed them: just comes down to testing.

Dne pá 25. 6. 2021 15:01 uživatel Jan Šablatura @.***> napsal:

I wanted to give you a few tips, but yesterday in the class there was no time for that.

Overall it's a good code. I've checked if someone came with that idea before and haven't found anyone, so you have points for uniqueness and creativity! :)

From the mistakes that jump out at you at first sight; It's considered a bad habit to name your variables as "x". Or even a "msg" could be considered as such. In your case, for example; your part of code: "for x in msg:".

I bet you probably know already why it's a bad habit, so I guess no need to explain that. If not, it's easily googleable, just check on habits that lead to easy readability for other programmers. Sometimes I see programmers say "but hey I wrote the code just for myself", well as you can see I was reading it too. ;) Always think about the others that they can potentially read the code in the future. In the long run, it will help you as well, when you'll be returning to the code after a few months or years. How to improve the code?

Well, there are multiple ideas that could be included. But overall, speaking about any algorithm, it's a great habit to just test it extensively. Test it in a real environment. And test it against other algorithms.

Speaking about testing it in the real environment, you've probably done that already on the Discord I believe. That's a good start.

What I would do in your case, I'd utilize the knowledge that you have from yesterday's lesson and just scrape any Twitter account posts. One that you can find funny is the "unfiltered developer's commit messages" at https://twitter.com/gitlost. But you can use nearly any Twitter account. Of course, there are many prepared Twitter datasets already, but hey - rather try to make your own dataset. As for now, you should have the ability for that so why not improving it so you'll maximize your skill on that? I bet you can do so! :)

If you want to rather test it in real-time, which is always more fun, I would try to utilize Twitch API. You can connect to any Twitch channel and you can be testing even thousands of messages in a minute.

Concerning the test against other algorithms, when you'll have the data from the real world prepared, you can simultaneously run your own filter with some other profanity filter API (a few can be found here: https://rapidapi.com/collection/profanity-filter) and compare their different outcomes. How many false positives they have? How fast they are? Everything can be compared. You can save the differences, create your own comparison data, and make a website where you're comparing your algorithm with others. I can promise you that "having real-world data comparison" will always convince investors. Having data to prove where your product is better is always a way next-gen approach than just saying "it is better". Proving it > saying it.

— John from the IT Step Academy

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/SIM7K/BotAlgos/issues/1, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOZSXBRLVDI6DNT473VRG63TUR435ANCNFSM47J24M5A .

SIM7K / BotAlgos

Great code and how to make it even better :) #1

How to improve the code?