Closed whol019 closed 6 years ago
thank you.
Would it be possible to upload a CSV file to write a test case for this?
Best.
On Tue, Jun 5, 2018 at 6:15 PM whol019 notifications@github.com wrote:
Hi Abelardo I have installed ontask from 2.5 to 2.7 now. Each time I will need to patch the csvupload file as currently it wont cope with unicode in field names. Sending you our csvupload file here, hope it could help. csvupload.py.txt https://github.com/abelardopardo/ontask_b/files/2074169/csvupload.py.txt it basically try to replace each column to unidecode(unicode(text, encoding = "utf-8"))
replace out this block with the following block
Process CSV file using pandas read_csv
try:
data_frame = pandas_db.load_df_from_csvfile(
request.FILES['file'],
form.cleaned_data['skip_lines_at_top'],
form.cleaned_data['skip_lines_at_bottom'])
except Exception as e:
form.add_error('file',
'File could not be processed ({0})'.format(e.message))
return render(request,
'dataops/upload1.html',
{'form': form,
'dtype': 'CSV',
'dtype_select': 'CSV file',
'prev_step': reverse('dataops:list')})
################################################################################# data_frame = pd.read_csv( request.FILES['file'], index_col=False, infer_datetime_format=True, quotechar='"', skiprows=form.cleaned_data['skip_lines_at_top'], skipfooter=form.cleaned_data['skip_lines_at_bottom']
,encoding='utf-8'
)
Strip white space from all string columns and try to convert to
datetime just in case
cols = {} try: for x in list(data_frame.columns): y=remove_non_ascii(x.strip()) cols[x]=y
if data_frame[x].dtype.name == 'object': # Column is a string! #data_frame[x] = data_frame[x].str.strip() # Try the datetime conversion try: series = pd.to_datetime(data_frame[x], infer_datetime_format=True) # Datetime conversion worked! Update the data_frame data_frame[x] = series except ValueError: pass data_frame.rename(columns=cols, inplace=True ) #print( data_frame )
except Exception as e: form.add_error('file', 'File could not be processed ({0})'.format(e.message)) return render(request, 'dataops/upload1.html', {'form': form, 'dtype': 'CSV', 'dtype_select': 'CSV file', 'prev_step': reverse('dataops:list')})
########################################################################
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/abelardopardo/ontask_b/issues/43, or mute the thread https://github.com/notifications/unsubscribe-auth/AAnIiEnfsjba1eqpmMRJVGY2qoKQ2CSYks5t5wLsgaJpZM4UbpEZ .
-- ABELARDO PARDO | Professor and Dean Academic Division of Information Technology, Engineering and the Environment Honorary Associate, School of Electrical and Information Engineering, The University of Sydney Research Fellow, University of Texas at Arlington UNIVERSITY OF SOUTH AUSTRALIA Mawson Lakes Campus (IPC MLK-08) GPO Box 2471 | Adelaide | SA | 5001 T +61 8 8302 3200 | Twitter @abelardopardo E abelardo.pardo@unisa.edu.au abelardo.pardo@sydney.edu.au | W people.unisa.edu.au/Abelardo.Pardo http://people.unisa.edu.au/Abelardo.Pardo Project Lead of OnTaskLearning.org https://ontasklearning.org/ ORCID: 0000-0002-6857-0582 https://orcid.org/0000-0002-6857-0582
Hi Abelardo I will webdropoff you our csv file Cannot upload it to github as there are people names/email etc. Cheers Wen From: Abelardo Pardo notifications@github.com Sent: Thursday, 7 June 2018 2:49 AM To: abelardopardo/ontask_b ontask_b@noreply.github.com Cc: whol019 wenchen.hol@gmail.com; Author author@noreply.github.com Subject: Re: [abelardopardo/ontask_b] csvupload issue with unicode field name (#43)
thank you.
Would it be possible to upload a CSV file to write a test case for this?
Best.
On Tue, Jun 5, 2018 at 6:15 PM whol019 notifications@github.com<mailto:notifications@github.com> wrote:
Hi Abelardo I have installed ontask from 2.5 to 2.7 now. Each time I will need to patch the csvupload file as currently it wont cope with unicode in field names. Sending you our csvupload file here, hope it could help. csvupload.py.txt https://github.com/abelardopardo/ontask_b/files/2074169/csvupload.py.txt it basically try to replace each column to unidecode(unicode(text, encoding = "utf-8"))
replace out this block with the following block
Process CSV file using pandas read_csv
try:
data_frame = pandas_db.load_df_from_csvfile(
request.FILES['file'],
form.cleaned_data['skip_lines_at_top'],
form.cleaned_data['skip_lines_at_bottom'])
except Exception as e:
form.add_error('file',
'File could not be processed ({0})'.format(e.message))
return render(request,
'dataops/upload1.html',
{'form': form,
'dtype': 'CSV',
'dtype_select': 'CSV file',
'prev_step': reverse('dataops:list')})
################################################################################# data_frame = pd.read_csv( request.FILES['file'], index_col=False, infer_datetime_format=True, quotechar='"', skiprows=form.cleaned_data['skip_lines_at_top'], skipfooter=form.cleaned_data['skip_lines_at_bottom']
,encoding='utf-8'
)
Strip white space from all string columns and try to convert to
datetime just in case
cols = {} try: for x in list(data_frame.columns): y=remove_non_ascii(x.strip()) cols[x]=y
if data_frame[x].dtype.name == 'object':
Column is a string!
data_frame[x] = data_frame[x].str.strip()
Try the datetime conversion
try: series = pd.to_datetime(data_frame[x], infer_datetime_format=True)
Datetime conversion worked! Update the data_frame
data_frame[x] = series except ValueError: pass data_frame.rename(columns=cols, inplace=True )
print( data_frame )
except Exception as e: form.add_error('file', 'File could not be processed ({0})'.format(e.message)) return render(request, 'dataops/upload1.html', {'form': form, 'dtype': 'CSV', 'dtype_select': 'CSV file', 'prev_step': reverse('dataops:list')})
########################################################################
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/abelardopardo/ontask_b/issues/43, or mute the thread https://github.com/notifications/unsubscribe-auth/AAnIiEnfsjba1eqpmMRJVGY2qoKQ2CSYks5t5wLsgaJpZM4UbpEZ .
-- ABELARDO PARDO | Professor and Dean Academic Division of Information Technology, Engineering and the Environment Honorary Associate, School of Electrical and Information Engineering, The University of Sydney Research Fellow, University of Texas at Arlington UNIVERSITY OF SOUTH AUSTRALIA Mawson Lakes Campus (IPC MLK-08) GPO Box 2471 | Adelaide | SA | 5001 T +61 8 8302 3200 | Twitter @abelardopardo E abelardo.pardo@unisa.edu.au<mailto:abelardo.pardo@unisa.edu.au> abelardo.pardo@sydney.edu.au<mailto:abelardo.pardo@sydney.edu.au> | W people.unisa.edu.au/Abelardo.Pardo http://people.unisa.edu.au/Abelardo.Pardo* Project Lead of OnTaskLearning.org https://ontasklearning.org/ ORCID: 0000-0002-6857-0582 https://orcid.org/0000-0002-6857-0582
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/abelardopardo/ontask_b/issues/43#issuecomment-395096063, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ASzlQNI9u_kUiZHcXGN1xAEoU_FtEMMLks5t5-vbgaJpZM4UbpEZ.
Hi Abelardo Upload again our patch file for v2.7. ( Last week found out our previous script did not work with v2.7. Very sorry. ) Upload it here again. ( we mark out the user_passes_test as it seems not working well with our single sign on milddleware patch ) csvupload.py.txt
Thank you. Looking into it.
On Mon, Jun 11, 2018 at 11:56 AM whol019 notifications@github.com wrote:
Hi Abelardo Upload again our patch file for v2.7. ( Last week found out our previous script did not work with v2.7. Very sorry. ) Upload it here again. ( we mark out the user_passes_test as it seems not working well with our single sign on milddleware patch ) csvupload.py.txt https://github.com/abelardopardo/ontask_b/files/2088518/csvupload.py.txt
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/abelardopardo/ontask_b/issues/43#issuecomment-396105664, or mute the thread https://github.com/notifications/unsubscribe-auth/AAnIiCp0uMWoYd7RykuDe8hE8rokypPeks5t7dVLgaJpZM4UbpEZ .
-- Abelardo Pardo
The fix for this issue is the same as the one proposed for Issue #98
Hi Abelardo I have installed ontask from 2.5 to 2.7 now. Each time I will need to patch the csvupload file as currently it wont cope with unicode in field names. Sending you our csvupload file here, hope it could help. csvupload.py.txt it basically try to replace each column to unidecode(unicode(text, encoding = "utf-8"))
replace out this block with the following block
Process CSV file using pandas read_csv
################################################################################# data_frame = pd.read_csv( request.FILES['file'], index_col=False, infer_datetime_format=True, quotechar='"', skiprows=form.cleaned_data['skip_lines_at_top'], skipfooter=form.cleaned_data['skip_lines_at_bottom']
,encoding='utf-8'
########################################################################