Bugswriter / tuxi

Tuxi is a cli assistant. Get answers of your questions instantly.
GNU General Public License v3.0
1.33k stars 73 forks source link

Babe it's time for your daily pull request; Yes honey #167

Closed PureArtistry closed 3 years ago

PureArtistry commented 3 years ago

did my first bit of html scraping today, noticed the scores were the wrong way around for the sports fixture snippet, figured I'd fix it... then I thought I may as well move them up next to the team names... then I re-wrote the whole thing scrot_20210303-190933

BeyondMagic commented 3 years ago

my god, that's so big

PureArtistry commented 3 years ago

what do you mean? the number of lines? the cat?

PureArtistry commented 3 years ago

need to tweak the output for if a match is in progress, haven't been able to test it until now - it doesn't work properly and it does need recode after all it seems

BeyondMagic commented 3 years ago

what do you mean? the number of lines? the cat?

Yeah, the number of lines, can you add some comments on the while loop?

PureArtistry commented 3 years ago

will do, even though it's a lot of lines it's actually really efficient as the only external things being run are pup and recode. everything else, all of the formatting, is done exclusively with builtins and most of those are conditions that won't be met on any given run so a lot of that code doesn't get executed

BeyondMagic commented 3 years ago

even though it's a lot of lines it's actually really efficient as the only external things being run are pup and recode. everything else, all of the formatting, is done exclusively with builtins and most of those are conditions that won't be met on any given run so a lot of that code doesn't get executed

ok

PureArtistry commented 3 years ago

I think this might be ready now, will need testing for other sports to see if the formatting works for stuff other than football

BeyondMagic commented 3 years ago

damn, man made a book, good for developers reading in the future

BeyondMagic commented 3 years ago

will need testing for other sports to see if the formatting works for stuff other than football

image

for american football

PureArtistry commented 3 years ago

damn, guess it's going to need a little more work, probably won't be until the weekend though - I bet a lot of the different sports are going to have slightly different arrangements

luckily it seems (judging by that image) the league name is always on the third line ($sf3) I can just do a case statement on that for the different league names and have a format for each, I might be able to get away with someting like: NFL, NBL, NBA, NHL and use *) for the football leagues or something like that

BeyondMagic commented 3 years ago

yeah, formatting it beautifully like you did will take a lot of time to different sports, is there anyway we can generalize it and make it less, let's say, pretty?

PureArtistry commented 3 years ago

the only way to generalise it would be to put it back how it was with it just spitting out one bit of info per line and you just have to make sense of it

with each different sport (and different "states" of those games within each sport) the order of a lot of the info changes, there do seem to be some constants though, like the league being on line 3

I doubt it would really take that long for me to format the rest of the stuff, I just need cache files for the different sports and match states. I know nothing about american sports so I'll need help knowing what keywords to use. if anybody wants to post the file created by using the -s option for sports they know, it will make doing the other formatting much easier and faster

BeyondMagic commented 3 years ago

I mean, in certain ways that's good, but one day someone will need to maintain all those scrapings and with more and more conditions, this will be harder. But some conditions, like if the match still happening, can be generalized as well, can't they? Instead of just printing everything at one line, why doesn't make states of the sports and printing scores at a separate line, etc. Anyway, this all talk so I will try to do it right now.

BeyondMagic commented 3 years ago

I'm just having too many problems with loading matches overall, I'm giving up...

Genghius commented 3 years ago

my god, that's so big

that's what she said.

Bugswriter commented 3 years ago

that's such a cool feature. Amazing job. You are truly a pro. is it ready? tomorrow is my final exam. Your PR will get passed more quickly.

BeyondMagic commented 3 years ago

my god, that's so big

that's what she said.

BREAKING NEWS: man with anime picture just got a girl, UNBELIEVABLE!

Genghius commented 3 years ago

my god, that's so big

that's what she said.

BREAKING NEWS: man with anime picture just got a girl, UNBELIEVABLE!

nevermind, we broke up. She made me choose between my waifu and her.

PureArtistry commented 3 years ago

just the define snippet to go

@Genghius my condolences for your relationship woes

PureArtistry commented 3 years ago

was busy yesterday, sorry.

had to re-work the define snippet entirely, the last version worked great until the format of the html deviated slightly and then it fell on it's arse and just spat out unreadable junk :(

started from scratch, scraping a different div and the results are much better now anyway, plus this one seems bulletproof.

scrot_20210307-095618

@Bugswriter - I think this may be ready now unless anybody has found any bugs?

Bugswriter commented 3 years ago

that was so good. I love this define functions and your system. Waiting to merge this with main.

PureArtistry commented 3 years ago

that was so good. I love this define functions and your system. Waiting to merge this with main.

thanks! :)

PureArtistry commented 3 years ago

found something to trip the new define snippet up slightly (it's not that bad), working on a fix now though, shouldn't take long

PureArtistry commented 3 years ago

sorted (until the next word that has inconsistant formatting in the html)

and speaking of which, just found another - another easy fix but my food is ready so commit will be in about an hour

PureArtistry commented 3 years ago

I have realised that this define snippet is only going to work if your language is english going to need figure out the different lang codes and get the translations for word types (noun, verb etc) and update the snippet with all the different lang variations

PureArtistry commented 3 years ago

bollocks, I'm going to need help with making this work with other languages. I just tried checking out the definitions in french and realised it's going to take me ages to get all the various word type translations.

going to tweak the script to only use the new define if your lang is en_*

Genghius commented 3 years ago

Lets just force the language to be english and leave the actual language to the user's common sense. (weather Cº Fº flashbacks intensify)

PureArtistry commented 3 years ago

@Genghius - that's kind of what it is now, I don't mind putting in the work and making it work for other languages if somebody else is willing to send me a list of all possible word types for their language (according to how google displays it)

I looked for a list in french for instance and the list I found was slightly different to how it was displayed on the google results which is why I gave up on the idea of doing it myself - I don't know enough foreign langs for it to be efficient to try

same sort of thing for the sport fixture stuff, I've got an updated formatting for football but I'm going to need other people to send me html for other sports at various different game states so I can update the formatting for those, again I don't know enough about other sports to really know what to go searching for

PureArtistry commented 3 years ago

@Bugswriter - aside from bug fixes (if needed) this is going to be my last commit to this, just going to wait now until everything's merged before I start fiddling with it again.

BeyondMagic commented 3 years ago

In japanese it's complete different, the older version gives the meanings, though

image

PureArtistry commented 3 years ago

@BeyondMagic I'm betting now that that was the rich snippet rather than define, may need to mess with priority a bit more again if you do -a it should have the same answer as the old one

I can't copy/paste from your image to test myself

BeyondMagic commented 3 years ago

Nevermind, I was using the wrong version, the new one gives a second meaning and an example

the query is "好 meaning", I think you need to change the TUXI_LANG env, but this isn't a big problem though. I can resolve it for myself, the question is how big this can ending because of other languages? I'm afraid the code of define will be unmaintainable for other languages.

image

PureArtistry commented 3 years ago

does that look right though? I have no clue what I'm looking at (and is that with tuxi's lang set to english?)

BeyondMagic commented 3 years ago

this is what Google gives, it's not perfect

image

and is that with tuxi's lang set to english?

JA_JP. You should add a new parameter in the .gitub_templates (issues) for the language of the system.

PureArtistry commented 3 years ago

scrot_20210307-160414

top one is with my system lang (en_GB) and as you can see if I change my lang to jp it uses a different snippet

how much of a mess does the english version look? (I can't read that)

BeyondMagic commented 3 years ago

how much of a mess does the english version look?

It's good enough, how did you get the second one?

I can only repeat the first output

PureArtistry commented 3 years ago

how much of a mess does the english version look?

It's good enough, how did you get the second one?

I can only repeat the first output

a_define() { # Define (eg: define Aggrandize) //original snippet credit @igaurab
    case "$LANGUAGE" in
    en_*) dfn_use_new=true ;;
    *) dfn_use_new=false ;;
    esac

I used the -l flag to change my language to japanese, it will only use the new define code if your tuxi language is set to english

BeyondMagic commented 3 years ago

I used the -l flag to change my language to japanese, it will only use the new define code if your tuxi language is set to english

I see... nice feature.

PureArtistry commented 3 years ago

I know I said I would stop fiddling but I got bored. was looking to maybe optimise some of the functions, after a bit of testing discovered that the rich snippet was basically redundant and the div that it was scraping caused problems for other snippets

this is why on main you can't put rich above define, rich grabbed from div.XcVN5d. XcVN5d is the suffix used for the headline of answers that appear at the top, the different types of answers have a different prefix. for eample kno_val uses Z0LcW.XcVN5d and for definitions on the google page, the word you are looking up is DgZBFd.XcVN5d.frCXef. rich was grabbing that before define could print out what you wanted.

the other 2 scrapes in the rich function got more lyrics (the 2 lyrics scrapes we have already grab everything) and the other similar info to kno_right.

so in the end I scrapped the function but left the 2 good (but not needed) div labels in comments in case they become useful again.

also the list snippet tended to mangle a lot of the output

scrot_20210308-173844

this is because all the formatting for the lists was done with html tags, each tag appearing on a new line. google highlights keywords you used in bold and those tags split up the text over multiple line, messing up the output.

I re-wrote the snippet and instead of having pup strip out the html, I left it in and used it to rebuild the output (mostly) correctly - it's not always exactly as it should be, sometimes there are extra spaces but that's easier to ignore than sentences split across multiple lines in weird places.

scrot_20210308-173947

PureArtistry commented 3 years ago

still bored, added a new feature; the -u flag (this can close #71)

scrot_20210309-000250

if you add the flag, at the end of every query it will print out google's list of top urls

scrot_20210309-000446

or if you search for something and get no results it will print out automatically

scrot_20210309-000611

the no results for that is a google thing btw, sometimes it gives you the html with the list, sometimes it doesn't - google be strange sometimes (I checked this in the browser too btw)

PureArtistry commented 3 years ago

@Bugswriter - now I think I'm done messing with it (kinda run out of ideas)

BeyondMagic commented 3 years ago

@Bugswriter you there?

Bugswriter commented 3 years ago

yes

BeyondMagic commented 3 years ago

it's time, isn't?

PureArtistry commented 3 years ago

noticed some activity on here so I took another quick look at the script this morning; small update, found out what the mR2gOd div was for and added a tracklist snippet scrot_20210317-124145