Hi Abelardo I have installed ontask from 2.5 to 2.7 now. Each time I will need to patch the csvupload file as currently it wont cope with unicode in field names. Sending you our csvupload file here, hope it could help. csvupload.py.txt it basically try to replace each column to unidecode(unicode(text, encoding = "utf-8"))

replace out this block with the following block

Process CSV file using pandas read_csv

#try:
#    data_frame = pandas_db.load_df_from_csvfile(
#        request.FILES['file'],
#        form.cleaned_data['skip_lines_at_top'],
#        form.cleaned_data['skip_lines_at_bottom'])
#except Exception as e:
#    form.add_error('file',
#                   'File could not be processed ({0})'.format(e.message))
#    return render(request,
#                  'dataops/upload1.html',
#                  {'form': form,
#                   'dtype': 'CSV',
#                   'dtype_select': 'CSV file',
#                   'prev_step': reverse('dataops:list')})

################################################################################# data_frame = pd.read_csv( request.FILES['file'], index_col=False, infer_datetime_format=True, quotechar='"', skiprows=form.cleaned_data['skip_lines_at_top'], skipfooter=form.cleaned_data['skip_lines_at_bottom']

,encoding='utf-8'

    )
    # Strip white space from all string columns and try to convert to
    # datetime just in case
cols = {}
try:
    for x in list(data_frame.columns):
        y=remove_non_ascii(x.strip())
        cols[x]=y

        if data_frame[x].dtype.name == 'object':

            # Column is a string!
            #data_frame[x] = data_frame[x].str.strip()

            # Try the datetime conversion
            try:
                series = pd.to_datetime(data_frame[x],
                                        infer_datetime_format=True)
                # Datetime conversion worked! Update the data_frame
                data_frame[x] = series
            except ValueError:
                pass
    data_frame.rename(columns=cols, inplace=True )
    #print( data_frame )
except Exception as e:
    form.add_error('file',
                   'File could not be processed ({0})'.format(e.message))
    return render(request,
                  'dataops/upload1.html',
                  {'form': form,
                   'dtype': 'CSV',
                   'dtype_select': 'CSV file',
                   'prev_step': reverse('dataops:list')})

########################################################################

thank you.

Would it be possible to upload a CSV file to write a test case for this?

Best.

On Tue, Jun 5, 2018 at 6:15 PM whol019 notifications@github.com wrote:

Hi Abelardo I have installed ontask from 2.5 to 2.7 now. Each time I will need to patch the csvupload file as currently it wont cope with unicode in field names. Sending you our csvupload file here, hope it could help. csvupload.py.txt https://github.com/abelardopardo/ontask_b/files/2074169/csvupload.py.txt it basically try to replace each column to unidecode(unicode(text, encoding = "utf-8"))

replace out this block with the following block

Process CSV file using pandas read_csv

try:

data_frame = pandas_db.load_df_from_csvfile(

request.FILES['file'],

form.cleaned_data['skip_lines_at_top'],

form.cleaned_data['skip_lines_at_bottom'])

except Exception as e:

form.add_error('file',

'File could not be processed ({0})'.format(e.message))

return render(request,

'dataops/upload1.html',

{'form': form,

'dtype': 'CSV',

'dtype_select': 'CSV file',

'prev_step': reverse('dataops:list')})

################################################################################# data_frame = pd.read_csv( request.FILES['file'], index_col=False, infer_datetime_format=True, quotechar='"', skiprows=form.cleaned_data['skip_lines_at_top'], skipfooter=form.cleaned_data['skip_lines_at_bottom']

,encoding='utf-8'

)

Strip white space from all string columns and try to convert to

datetime just in case

cols = {} try: for x in list(data_frame.columns): y=remove_non_ascii(x.strip()) cols[x]=y
    if data_frame[x].dtype.name == 'object':

        # Column is a string!
        #data_frame[x] = data_frame[x].str.strip()

        # Try the datetime conversion
        try:
            series = pd.to_datetime(data_frame[x],
                                    infer_datetime_format=True)
            # Datetime conversion worked! Update the data_frame
            data_frame[x] = series
        except ValueError:
            pass
data_frame.rename(columns=cols, inplace=True )
#print( data_frame )
except Exception as e: form.add_error('file', 'File could not be processed ({0})'.format(e.message)) return render(request, 'dataops/upload1.html', {'form': form, 'dtype': 'CSV', 'dtype_select': 'CSV file', 'prev_step': reverse('dataops:list')})

########################################################################

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/abelardopardo/ontask_b/issues/43, or mute the thread https://github.com/notifications/unsubscribe-auth/AAnIiEnfsjba1eqpmMRJVGY2qoKQ2CSYks5t5wLsgaJpZM4UbpEZ .

-- ABELARDO PARDO | Professor and Dean Academic Division of Information Technology, Engineering and the Environment Honorary Associate, School of Electrical and Information Engineering, The University of Sydney Research Fellow, University of Texas at Arlington UNIVERSITY OF SOUTH AUSTRALIA Mawson Lakes Campus (IPC MLK-08) GPO Box 2471 | Adelaide | SA | 5001 T +61 8 8302 3200 | Twitter @abelardopardo E abelardo.pardo@unisa.edu.au abelardo.pardo@sydney.edu.au | W people.unisa.edu.au/Abelardo.Pardo http://people.unisa.edu.au/Abelardo.Pardo Project Lead of OnTaskLearning.org https://ontasklearning.org/ ORCID: 0000-0002-6857-0582 https://orcid.org/0000-0002-6857-0582

Hi Abelardo I will webdropoff you our csv file Cannot upload it to github as there are people names/email etc. Cheers Wen From: Abelardo Pardo notifications@github.com Sent: Thursday, 7 June 2018 2:49 AM To: abelardopardo/ontask_b ontask_b@noreply.github.com Cc: whol019 wenchen.hol@gmail.com; Author author@noreply.github.com Subject: Re: [abelardopardo/ontask_b] csvupload issue with unicode field name (#43)

thank you.

Would it be possible to upload a CSV file to write a test case for this?

Best.

On Tue, Jun 5, 2018 at 6:15 PM whol019 notifications@github.com<mailto:notifications@github.com> wrote:

Hi Abelardo I have installed ontask from 2.5 to 2.7 now. Each time I will need to patch the csvupload file as currently it wont cope with unicode in field names. Sending you our csvupload file here, hope it could help. csvupload.py.txt https://github.com/abelardopardo/ontask_b/files/2074169/csvupload.py.txt it basically try to replace each column to unidecode(unicode(text, encoding = "utf-8"))

replace out this block with the following block

Process CSV file using pandas read_csv

try:

data_frame = pandas_db.load_df_from_csvfile(

request.FILES['file'],

form.cleaned_data['skip_lines_at_top'],

form.cleaned_data['skip_lines_at_bottom'])

except Exception as e:

form.add_error('file',

'File could not be processed ({0})'.format(e.message))

return render(request,

'dataops/upload1.html',

{'form': form,

'dtype': 'CSV',

'dtype_select': 'CSV file',

'prev_step': reverse('dataops:list')})

################################################################################# data_frame = pd.read_csv( request.FILES['file'], index_col=False, infer_datetime_format=True, quotechar='"', skiprows=form.cleaned_data['skip_lines_at_top'], skipfooter=form.cleaned_data['skip_lines_at_bottom']

,encoding='utf-8'

)

Strip white space from all string columns and try to convert to

datetime just in case

cols = {} try: for x in list(data_frame.columns): y=remove_non_ascii(x.strip()) cols[x]=y

if data_frame[x].dtype.name == 'object':

Column is a string!

data_frame[x] = data_frame[x].str.strip()

Try the datetime conversion

try: series = pd.to_datetime(data_frame[x], infer_datetime_format=True)

Datetime conversion worked! Update the data_frame

data_frame[x] = series except ValueError: pass data_frame.rename(columns=cols, inplace=True )

print( data_frame )

except Exception as e: form.add_error('file', 'File could not be processed ({0})'.format(e.message)) return render(request, 'dataops/upload1.html', {'form': form, 'dtype': 'CSV', 'dtype_select': 'CSV file', 'prev_step': reverse('dataops:list')})

########################################################################

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/abelardopardo/ontask_b/issues/43, or mute the thread https://github.com/notifications/unsubscribe-auth/AAnIiEnfsjba1eqpmMRJVGY2qoKQ2CSYks5t5wLsgaJpZM4UbpEZ .

-- ABELARDO PARDO | Professor and Dean Academic Division of Information Technology, Engineering and the Environment Honorary Associate, School of Electrical and Information Engineering, The University of Sydney Research Fellow, University of Texas at Arlington UNIVERSITY OF SOUTH AUSTRALIA Mawson Lakes Campus (IPC MLK-08) GPO Box 2471 | Adelaide | SA | 5001 T +61 8 8302 3200 | Twitter @abelardopardo E abelardo.pardo@unisa.edu.au<mailto:abelardo.pardo@unisa.edu.au> abelardo.pardo@sydney.edu.au<mailto:abelardo.pardo@sydney.edu.au> | W people.unisa.edu.au/Abelardo.Pardo http://people.unisa.edu.au/Abelardo.Pardo* Project Lead of OnTaskLearning.org https://ontasklearning.org/ ORCID: 0000-0002-6857-0582 https://orcid.org/0000-0002-6857-0582

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/abelardopardo/ontask_b/issues/43#issuecomment-395096063, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ASzlQNI9u_kUiZHcXGN1xAEoU_FtEMMLks5t5-vbgaJpZM4UbpEZ.

Hi Abelardo Upload again our patch file for v2.7. ( Last week found out our previous script did not work with v2.7. Very sorry. ) Upload it here again. ( we mark out the user_passes_test as it seems not working well with our single sign on milddleware patch ) csvupload.py.txt

Thank you. Looking into it.

On Mon, Jun 11, 2018 at 11:56 AM whol019 notifications@github.com wrote:

Hi Abelardo Upload again our patch file for v2.7. ( Last week found out our previous script did not work with v2.7. Very sorry. ) Upload it here again. ( we mark out the user_passes_test as it seems not working well with our single sign on milddleware patch ) csvupload.py.txt https://github.com/abelardopardo/ontask_b/files/2088518/csvupload.py.txt

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/abelardopardo/ontask_b/issues/43#issuecomment-396105664, or mute the thread https://github.com/notifications/unsubscribe-auth/AAnIiCp0uMWoYd7RykuDe8hE8rokypPeks5t7dVLgaJpZM4UbpEZ .

-- Abelardo Pardo

The fix for this issue is the same as the one proposed for Issue #98

abelardopardo / ontask_b

csvupload issue with unicode field name #43

Process CSV file using pandas read_csv

,encoding='utf-8'

Process CSV file using pandas read_csv

try:

data_frame = pandas_db.load_df_from_csvfile(

request.FILES['file'],

form.cleaned_data['skip_lines_at_top'],

form.cleaned_data['skip_lines_at_bottom'])

except Exception as e:

form.add_error('file',

'File could not be processed ({0})'.format(e.message))

return render(request,

'dataops/upload1.html',

{'form': form,

'dtype': 'CSV',

'dtype_select': 'CSV file',

'prev_step': reverse('dataops:list')})

,encoding='utf-8'

Strip white space from all string columns and try to convert to

datetime just in case

Process CSV file using pandas read_csv

try:

data_frame = pandas_db.load_df_from_csvfile(

request.FILES['file'],

form.cleaned_data['skip_lines_at_top'],

form.cleaned_data['skip_lines_at_bottom'])

except Exception as e:

form.add_error('file',

'File could not be processed ({0})'.format(e.message))

return render(request,

'dataops/upload1.html',

{'form': form,

'dtype': 'CSV',

'dtype_select': 'CSV file',

'prev_step': reverse('dataops:list')})

,encoding='utf-8'

Strip white space from all string columns and try to convert to

datetime just in case

Column is a string!

data_frame[x] = data_frame[x].str.strip()

Try the datetime conversion

Datetime conversion worked! Update the data_frame

print( data_frame )