Open schlos opened 4 years ago
IF all email fields passed the validation, write status: updated IF all email fields failed the validation, write status: failed
If one or two of them failed the validation then what should be the status passed or failed
@SelectSoft good point. I've updated definition as:
IF any of email fields failed the validation, write status
: failed
ok
saifullahalam, Jun 10, 11:29 AM
I did the validation for mail, foi_officer_mail and website.... but the number of attribute website_1 website_2 mail_1 mail_2 foi_officer_mail_1 and 2 are not fixed......sometimes they are 2 and sometimes they are more then 2..... not a fixed value.... I didnt not find a way to validate change variables....
New code for email validation: https://github.com/SelectSoft/blue_gene/blob/master/scraper.py#L217 to https://github.com/SelectSoft/blue_gene/blob/master/scraper.py#L225
@SelectSoft I've added this suggestion, but I don't know correct syntax so I wrote a pseudo code to give you general idea. Could you check f this is possible?
Current:
https://github.com/SelectSoft/blue_gene/blob/master/scraper.py#L217 to https://github.com/SelectSoft/blue_gene/blob/master/scraper.py#L225
Proposed change / pseudo code - to add validation for all 8 fields with email value:
if(isValidEmail(allData['email'][x]) and base_data["email"].notnull):
email_validation_pass = "true"
elseif(base_data["email"].isnull):
email_validation_pass = "nan"
else:
email_validation_pass = "fail"
if(isValidEmail(allData['foi_officer_email'][x]) and base_data["foi_officer_email"].notnull):
foi_officer_email_validation_pass = "true"
elseif(base_data["foi_officer_email"].isnull):
foi_officer_email_validation_pass = "nan"
else:
foi_officer_email_validation_pass = "fail"
if(isValidEmail(allData['email_1'][x]) and base_data["email_1"].notnull):
email_1_validation_pass = "true"
elseif(ibase_data["email_1"].isnull):
email_1_validation_pass = "nan"
else:
email_1_validation_pass = "fail"
if(isValidEmail(allData['email_2'][x]) and base_data["email_2"].notnull):
email_2_validation_pass = "true"
elseif(base_data["email_2"].isnull):
email_2_validation_pass = "nan"
else:
email_2_validation_pass = "fail"
if(isValidEmail(allData['email_3'][x]) and base_data["email_3"].notnull):
email_3_validation_pass = "true"
elseif(base_data["email_3"].isnull):
email_3_validation_pass = "nan"
else:
email_3_validation_pass = "fail"
if(isValidEmail(allData['foi_officer_email_1'][x]) and base_data["foi_officer_email_1"].notnull):
foi_officer_email_1_validation_pass = "true"
elseif(base_data["foi_officer_email_1"].isnull):
foi_officer_email_1_validation_pass = "nan"
else:
foi_officer_email_1_validation_pass = "fail"
if(isValidEmail(allData['foi_officer_email_2'][x]) and base_data["foi_officer_email_2"].notnull):
foi_officer_email_2_validation_pass = "true"
elseif(base_data["foi_officer_email_2"].isnull):
foi_officer_email_2_validation_pass = "nan"
else:
foi_officer_email_2_validation_pass = "fail"
if(isValidEmail(allData['foi_officer_email_3'][x]) and base_data["foi_officer_email_3"].notnull):
foi_officer_email_3_validation_pass = "true"
elseif(base_data["foi_officer_email_3"].isnull):
foi_officer_email_3_validation_pass = "nan"
else:
foi_officer_email_3_validation_pass = "fail"
if(email_validation_pass != "fail" or foi_officer_email_validation_pass != "fail" or email_1_validation_pass != "fail" or email_2_validation_pass != "fail" or email_3_validation_pass != "fail" or foi_officer_email_1_validation_pass != "fail" or foi_officer_email_2_validation_pass != "fail" or foi_officer_email_3_validation_pass != "fail"):
allData['email_status'][x] = "updated"
else:
allData['email_status'][x] = "failed"
@SelectSoft please check following:
line with VAT number 37927943647 has email in the field 'foi_officer_email' = '[CENSORSED]@ekokong.hr' --> but result in 'email_validation_pass' = 'nan' (nan means no email in any email fields)
there are multiple line with this example where one email is present but result is 'nan'. Could you check it out?
Actually - ignore last, I see you've added additional field named 'foi_officer_email_validation_pass' for this field validation. This looks fine.
@SelectSoft functionality wise all looks good with email validation.
I have additional request, in the fields
email_validation_pass | website_validation_pass | foi_officer_email_validation_pass
currently we have following values:
Could we change wording to use same system? Expected would be something like:
Thanks!
Where to implement the change: "Morph script" - script that parses TJV register and persists it to db https://morph.io/SelectSoft/blue_gene
Current: Sometimes due to human error, processing is not done properly, skipped or stopped, so resulting Morph.io database has invalid records.
Expected: Add checks in Morph.io scraper in each Email field:
email
foi_officer_email
email_1
email_2
foi_officer_email_1
foi_officer_email_2
See also: https://www.pythoncentral.io/how-to-validate-an-email-address-using-python/
Add email validation
Workflow:
After scraping public TJV register, add a RegEx check in each email field when value is not null (!=null).
Based on result of a RegEx, write to the
status
field for that processed row:status
:updated
status
:failed