beezlabs-org / l3-support

L3 support tracker
0 stars 0 forks source link

Crawler Support Fixes and Webscraping Issue #2870112 #75

Closed Sakthi1502 closed 1 year ago

Sakthi1502 commented 1 year ago

Creating this issue to check on the ticket created for two Crawler Websites data and also track support fixes.

Sakthi1502 commented 1 year ago

Update: Client has informed on an issue with the loading of Web Scraping Wave 2 files yesterday. Upon further checking, issues have been identified with the source file takealot-10-1-2023.csv & vintagewinespirits_10-03-23_10.47.csv. Need a resource to be allocated on this since an ETA needs to be conveyed to client.

Sakthi1502 commented 1 year ago

As discussed in the factory call @SanthoshKumar2220 has been allocated on this.

Image

Sakthi1502 commented 1 year ago

Please find the status report on October Runs for Crawler Websites. October_Report.xlsx

Things that we need support from Developer: 1) Ticket #2870112 2) Fixes needed for 5/130 Websites. 3) Need to check on websites marked as Site Issues.

Sakthi1502 commented 1 year ago

@NaveenKumarD8 has been allocated as discussed in developer call for the mentioned tasks.

Sakthi1502 commented 1 year ago

@NaveenKumarD8 Hope this task has been commenced. Let me know if its not the case.

NaveenKumarD8 commented 1 year ago

Not yet started, From Tomorrow will start the crawler support Activity.

koushik-ac commented 1 year ago

@NaveenKumarD8 to start the fixes from today.

koushik-ac commented 1 year ago

Expected ETA - 20/10/2023.

koushik-ac commented 1 year ago

We are on track to close all the issues by 20/10/2023

Dean13b commented 1 year ago

4/8 issues completed.

koushik-ac commented 1 year ago

@NaveenKumarD8 - Kindly update the status here.

Dean13b commented 1 year ago

Planned completion is 20th Oct and review is pending.

NaveenKumarD8 commented 1 year ago

Ngf Site has some issues. 1.All the XPaths have been changed. 2.In UI it shows 20 products But while Debugging It shows 42 products it shows the products into which are not present in the UI 3.And then product price is not showing for a particular project if we navigate to 2nd page and if we again navigate to 1st page it shows the price for the project. So the ngfcoza site is considered as a enhancement. kirbywine issue. 1.All the Selectors were changed. 2.After changing the selectors it shows null. The bot was built using html agility it doesn't work now, we have to change that into selenium. Kirbywine is considered as an enhancement.

Sakthi1502 commented 1 year ago

Issues found during validation check and needs to be checked on priority:

Auckland airport - Currency column is empty Takealot - Columns contain details other than expected data for each column Amazon - Product count very less. Kindly check in website with search category as "Licores" Vintagewinespirits - No pricing data available

Validation Rules Link: https://azurebeezlabs-my.sharepoint.com/:w:/r/personal/sakthivel_beezlabs_com/Documents/Crawler%20Cleansing%20Rules.docx?d=w8fd2ae13013c4511b94a0e8647a3c8ff&csf=1&web=1&e=JDrbKL

NaveenKumarD8 commented 1 year ago

Auckland airport and Vintagewinespirits issues were fixed . Takealot and Amazon were pending will Complete it by Today.

Sakthi1502 commented 1 year ago

@NaveenKumarD8 Kindly provide an update once the pending issue is complete.

NaveenKumarD8 commented 1 year ago

Amazon Issue 1.All xpaths have been changed. 2.Some products does not have the price in the site. 3.It also been challenges with the formatting of prices, as the price format occasionally includes words, differs in each page. 4.Sometimes, it successfully scrapes around 90 pieces of data from the site, while at other times, it retrieves less than 50 data points. code level There is lot of changes takes around two days time effort. Amazon is considered as an enhancement.