Open barrytarter opened 2 years ago
I'd like to know more about the spreadsheet
and roster
table. Please send me sample spreadsheet file.
Schema::create('rosters', function (Blueprint $table) {
$table->id();
$table->string('university');
$table->string('url');
$table->string('sport');
$table->timestamps();
});
Should we use TALL stack in our app?
@hardcommitoneself
Here is what @edgrosvenor shared with me -- this will mainly be all back-end functionality so feel free to use whatever you prefer, e.g. browsershot, curl, even python is ok, etc. The early output might be CSVs of the profile data just to check it (e.g. name, position, year in school, etc).
If you are planning to do a front-end piece, TALL would be useful.
Does that make sense?
Thanks for letting me know, @barrytarter . It makes sense. So first I will scrap basic profile data(name, position, year etc) from the url provided from excel. I am not sure if you did check slack message. I mentioned that I will use Roach PHP to scrap data from the other sites.
@hardcommitoneself great, yes, this is best place to reach both me and Ed!
@barrytarter
I just finished import excel feature and now I am gonna build scrapper.
So, after import excel file, should our scrapper work automatically? or we need to handle it manually?(start scrapping
button something like that)
@hardcommitoneself For now, whatever is easiest to get a 'test' version live that successfully pulls and stores data. If @edgrosvenor has any tips, he'll share them here as well.
You'll need to create unique decision rules for pulling the roster data as some rosters are very similar and others are different, e.g. these two are sites that use "Sidearm Sports" templates: https://acusports.com/sports/womens-volleyball/roster https://asugrizzlies.com/sports/mens-soccer/roster
These ones also use Sidearm sports, but a different template I think: https://aupanthers.com/sports/mens-soccer/roster https://bamastatesports.com/sports/womens-volleyball/roster https://auwolves.com/sports/mens-soccer/roster
These are both from Presto Sports templates, but the templates look different: https://goamcats.com/sports/msoc/2017-18/roster https://www.sunyadktimberwolves.com/sports/msoc/2017-18/roster
@barrytarter
In my opinion, how about checking the number of tr
of all tables in each page?
So, as I noticed so far, it seems that there is only one table which have over many items(I think that is what we want).
@hardcommitoneself I like that approach. We might need a way to decipher the type of content listed.
e.g. grade level (aka "graduation Year") values could be categorized by word, e.g. 'freshman', sophomore, junior, senior? I look forward to seeing how you figure it out!
@barrytarter
I just noticed that some rosters have no tables
(instead list
). https://www.artuathletics.com/sports/womens-volleyball/roster
I think we need to build logic for the ul list
.
It is what I just reached out to now. I think it will be base of our scrapper. Please check it out and let me know feedback.
Please take a look at this screenshot.
You can notice that the Year
field. The filed's value is different with the others.
How can I convert the numbers(1
, 3
, etc) to real year value(Fr.
, Sr
etc)?
@hardcommitoneself here is one possible guide on how to map the data: https://docs.google.com/spreadsheets/d/1QBCGpvXjoDAH50wQTTnYLj5cWzb3TlXWWUPn-g3kk78/edit?usp=sharing.
Specifically for the numbers, it could map as 1 = Freshman, 2 = Sophomore, 3 = Junior; 4 = Senior; 5 = Senior; 6 = Senior.
@barrytarter
https://www.loom.com/share/262f7d29525f45eba0caa4e8455a965d Please check this video. And give me feedback.
@barrytarter @edgrosvenor
Regarding the extra
field of athlete table, should we add the follow fields to it?
@hardcommitoneself ,
Thanks for sharing. Can we store both as text for now? The first is a height field and the second is where they played in high school. These are pretty common, so good to collect.
@hardcommitoneself will you be able to begin developing the crawler that will find the missing Twitter and Instagram IDs?
Step 3 in https://docs.google.com/document/d/1YmfAFYu4Cyl99ninB4KAeML4y-nmRW0gzI6Xeydg_2g/edit?usp=drivesdk
Can you get a v1 of that part ready by Wednesday?
@hardcommitoneself Go ahead and add any data that you think might be valuable as key / value pairs in the extra column. While you're at it, enable this package for that column: https://github.com/spatie/laravel-schemaless-attributes That will allow you to do things like $athlete->extra->set('height', '5\'9"');
. I think maybe I've included the package in composer (maybe not), but I haven't added the trait to the model.
@barrytarter @edgrosvenor
Regarding the second crawler, I think we can use opendorse.com
to scrap our athlete's contact info.
The following is just my opinion.
university
by university name https://opendorse.com/searchshowAthletesNotOptedInToDeals=true&showUnclaimedAccounts=true&term=Abilene+Christian+UniversityThat's it. I am not sure this approach is working for all rosters. So I just want to test with real links.
@barrytarter @edgrosvenor
I wrote my suggestion below.
I think we'd better to use Google search engine by using name
, sport
, college
for our contact crawler.
I checked manually with many athletes and it looked nice.
example search query -
google.com/search?q=twitter+Nicole+Barham+ACU+soccer
https://www.google.com/search?q=instagram+Nicole+Barham+ACU+soccer
Please take a look at it and give me your idea.
Sure, we can test that out and see how the data looks.
@barrytarter @edgrosvenor
Hi, Hope you are having nice weekend!
Please take a look at this video. https://www.loom.com/share/96661444867a4df98f6fdef1756662e3 You can notice that this scrapper is working well. Give me your feedback.
Sorry to bother you. :)
@hardcommitoneself here are some more good links of rosters. Can you check to see how many profiles you can pull from the rosters (100%?), how much data is filled in for position, height, weight, grad year, from?, how many twitter links you get? how many instagram? How many opendorse?
https://artuathletics.com/sports/mens-soccer/roster https://asugrizzlies.com/sports/mens-soccer/roster https://aupanthers.com/sports/mens-soccer/roster https://adrianbulldogs.com/sports/msoc/roster https://www.albertusfalcons.com/sports/msoc/2022-23/roster https://gobrits.com/sports/mens-soccer/roster https://albrightathletics.com/sports/mens-soccer/roster https://alfredstate.prestosports.com/sports/msoc/2022-23/roster https://gosaxons.com/sports/mens-soccer/roster https://alicelloydeagles.com/sports/msoc/2022-23/roster https://www.ahcbulldogs.com/sports/msoc/2022-23/roster https://www.allegany.edu/athletics/mens-soccer.html https://alleghenygators.com/sports/mens-soccer/roster https://sccstorm.com/sports/msoc/2022-23/roster https://almascots.com/sports/msoc/2022-23/roster https://auwolves.com/sports/mens-soccer/roster https://www.aicyellowjackets.com/sports/msoc/2022-23/roster https://www.arcbeavers.com/sports/msoc/2022-23/roster
@barrytarter @edgrosvenor
Please take a look at the following. https://www.loom.com/share/0708dffa27714d9eb3f0ac3072bb77c7 I implemented 100% automation for scrapping twitter id for test. I think this scrapper got almost twitter ids, so please check it manually. Then give me feedback. I already implemented opendorse logic last week, so I need to implement instagram logic now.
@hardcommitoneself could we add a method that would allow us to get this person's instagram and twitter? Caleb Kendra at 0:57 you'll see his name in https://www.loom.com/share/0708dffa27714d9eb3f0ac3072bb77c7. e.g. https://www.instagram.com/c_kendra2/
@barrytarter
So, do you want to get full twiiter link of atheltes like https://www.instagram.com/c_kendra2/ ?
@hardcommitoneself yes, we want the twitter, instagram, opendorse links for all athletes in the crawler.
@barrytarter
OK, as we discussed before, we can not get many athlete's social links since most of them don't have it. Anyway please take a look at the following.
@hardcommitoneself yes, if it doesn't exist, we definitely can't store one.
Caleb Kendra does have one but we didn't store it -- how do we fix that?
@barrytarter
I think we can store it. What's the problem? This is the structure of athlete table.
Great! Why didn't it store previously?
@barrytarter
Please take a look at it. I implemented opendorse
scrap method, so we can get not only opendorse link but also twitter
or instagram
link from there.
https://www.loom.com/share/2b18bd1d24a04f5bbdcd018221ab7a4a
@hardcommitoneself if you have any technical questions, feel free to post them in this issue here as should allow us to document the development process better.