Open jfcolomer opened 1 year ago
Hi,
Any help to understand how the post
individual items are created before they are passed to the postInfo = getPostInformation(str(post)) would be really appreciated:
`def scrape(driver, link, profileType): if (profileType == "Company"): link = f'{link}/posts/?feedView=ads' else: link = f'{link}/recent-activity/all/'
driver.get(link)
time.sleep(3)
posts = {}
old_position = 0
new_position = None
counter = 0
while new_position != old_position:
# Get old scroll position
old_position = driver.execute_script(
("return (window.pageYOffset !== undefined) ?"
" window.pageYOffset : (document.documentElement ||"
" document.body.parentNode || document.body);"))
time.sleep(1) #experimentar tirar eleste limte de tempo, para ver se a execução do programa é mais rápida, como o programa está a fazer processamento pode ser que não seja nbecessáio o tempo de sleep como era preciso no insta. No insta apenas estava a fazer scrool sem nenhum processamento pelo meio
soup = BeautifulSoup(driver.page_source, 'html.parser')
soup = str(soup)
results = soup.split('occludable-update')
# results = {}
for result in results:
try:
counter += 1
postlink = result.split('data-urn="')[counter].split('"')[0]
postlink = f'https://www.linkedin.com/feed/update/{postlink}'
except:
postlink = ''
if('linkedin' in postlink):
posts[postlink] = result
new_position = scroll(driver, old_position)
print(f'\n\nFound {len(posts)} posts.')
postsFiltered = []
for postlink, post in posts.items():
postInfo = getPostInformation(str(post))
postInfo.append(postlink)
postsFiltered.append(postInfo)`
After refactoring the link variable, link = f'{link}/posts/?feedView=ads' I can get the script to export all the final company promoted posts exported to the csv with this format: https://www.linkedin.com/feed/update/urn:li:activity:00000000000000001 https://www.linkedin.com/feed/update/urn:li:activity:00000000000000002 https://www.linkedin.com/feed/update/urn:li:activity:00000000000000003
and so on ...
But the description, hashtags etc.. will only return the values for the first of the posts, in this case https://www.linkedin.com/feed/update/urn:li:activity:00000000000000001 so it'd be really appreciated if you could explain how the post variable that is referenced here https://github.com/1dia100mijar/LinkedinScraperCompanies/blob/8365a6e2ea9cc721fbdbf2341d8b198b06c3289e/linkdin.py#L50 is generated.
Thanks
Hi there, Thanks for creating this script, it's fabulous! I was wondering what'd be the best way to target not every single post but specifically PROMOTE ADS, this is, the ones listed here: https://www.linkedin.com/company/{company-name}/posts/?feedView=ads For some reason when I update the link variable on the scrape function to be something like link = f'{link}/posts/?feedView=ads' it will only pick up the very first promoted ad but for some reason it won't be able to collect the remaining ones (i.e. 50 ads, it will return only 1 result) and from this result it won't be able to collect likes/links (i.e. an ad with a carousel and items with links). For ALL other posts, it does indeed work as a charm. Thanks