abelardopardo / ontask_b

A platform offering teachers and educational designers the capacity to use data to personalise the learner experience.
https://ontasklearning.org
MIT License
42 stars 35 forks source link

csvupload issue with unicode field name #43

Closed whol019 closed 6 years ago

whol019 commented 6 years ago

Hi Abelardo I have installed ontask from 2.5 to 2.7 now. Each time I will need to patch the csvupload file as currently it wont cope with unicode in field names. Sending you our csvupload file here, hope it could help. csvupload.py.txt it basically try to replace each column to unidecode(unicode(text, encoding = "utf-8"))

replace out this block with the following block

Process CSV file using pandas read_csv

#try:
#    data_frame = pandas_db.load_df_from_csvfile(
#        request.FILES['file'],
#        form.cleaned_data['skip_lines_at_top'],
#        form.cleaned_data['skip_lines_at_bottom'])
#except Exception as e:
#    form.add_error('file',
#                   'File could not be processed ({0})'.format(e.message))
#    return render(request,
#                  'dataops/upload1.html',
#                  {'form': form,
#                   'dtype': 'CSV',
#                   'dtype_select': 'CSV file',
#                   'prev_step': reverse('dataops:list')})

################################################################################# data_frame = pd.read_csv( request.FILES['file'], index_col=False, infer_datetime_format=True, quotechar='"', skiprows=form.cleaned_data['skip_lines_at_top'], skipfooter=form.cleaned_data['skip_lines_at_bottom']

,encoding='utf-8'

    )
    # Strip white space from all string columns and try to convert to
    # datetime just in case
cols = {}
try:
    for x in list(data_frame.columns):
        y=remove_non_ascii(x.strip())
        cols[x]=y

        if data_frame[x].dtype.name == 'object':

            # Column is a string!
            #data_frame[x] = data_frame[x].str.strip()

            # Try the datetime conversion
            try:
                series = pd.to_datetime(data_frame[x],
                                        infer_datetime_format=True)
                # Datetime conversion worked! Update the data_frame
                data_frame[x] = series
            except ValueError:
                pass
    data_frame.rename(columns=cols, inplace=True )
    #print( data_frame )
except Exception as e:
    form.add_error('file',
                   'File could not be processed ({0})'.format(e.message))
    return render(request,
                  'dataops/upload1.html',
                  {'form': form,
                   'dtype': 'CSV',
                   'dtype_select': 'CSV file',
                   'prev_step': reverse('dataops:list')})

########################################################################

abelardopardo commented 6 years ago

thank you.

Would it be possible to upload a CSV file to write a test case for this?

Best.

On Tue, Jun 5, 2018 at 6:15 PM whol019 notifications@github.com wrote:

Hi Abelardo I have installed ontask from 2.5 to 2.7 now. Each time I will need to patch the csvupload file as currently it wont cope with unicode in field names. Sending you our csvupload file here, hope it could help. csvupload.py.txt https://github.com/abelardopardo/ontask_b/files/2074169/csvupload.py.txt it basically try to replace each column to unidecode(unicode(text, encoding = "utf-8"))

replace out this block with the following block

Process CSV file using pandas read_csv

try:

data_frame = pandas_db.load_df_from_csvfile(

request.FILES['file'],

form.cleaned_data['skip_lines_at_top'],

form.cleaned_data['skip_lines_at_bottom'])

except Exception as e:

form.add_error('file',

'File could not be processed ({0})'.format(e.message))

return render(request,

'dataops/upload1.html',

{'form': form,

'dtype': 'CSV',

'dtype_select': 'CSV file',

'prev_step': reverse('dataops:list')})

################################################################################# data_frame = pd.read_csv( request.FILES['file'], index_col=False, infer_datetime_format=True, quotechar='"', skiprows=form.cleaned_data['skip_lines_at_top'], skipfooter=form.cleaned_data['skip_lines_at_bottom']

,encoding='utf-8'

)

Strip white space from all string columns and try to convert to

datetime just in case

cols = {} try: for x in list(data_frame.columns): y=remove_non_ascii(x.strip()) cols[x]=y

    if data_frame[x].dtype.name == 'object':

        # Column is a string!
        #data_frame[x] = data_frame[x].str.strip()

        # Try the datetime conversion
        try:
            series = pd.to_datetime(data_frame[x],
                                    infer_datetime_format=True)
            # Datetime conversion worked! Update the data_frame
            data_frame[x] = series
        except ValueError:
            pass
data_frame.rename(columns=cols, inplace=True )
#print( data_frame )

except Exception as e: form.add_error('file', 'File could not be processed ({0})'.format(e.message)) return render(request, 'dataops/upload1.html', {'form': form, 'dtype': 'CSV', 'dtype_select': 'CSV file', 'prev_step': reverse('dataops:list')})

########################################################################

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/abelardopardo/ontask_b/issues/43, or mute the thread https://github.com/notifications/unsubscribe-auth/AAnIiEnfsjba1eqpmMRJVGY2qoKQ2CSYks5t5wLsgaJpZM4UbpEZ .

-- ABELARDO PARDO | Professor and Dean Academic Division of Information Technology, Engineering and the Environment Honorary Associate, School of Electrical and Information Engineering, The University of Sydney Research Fellow, University of Texas at Arlington UNIVERSITY OF SOUTH AUSTRALIA Mawson Lakes Campus (IPC MLK-08) GPO Box 2471 | Adelaide | SA | 5001 T +61 8 8302 3200 | Twitter @abelardopardo E abelardo.pardo@unisa.edu.au abelardo.pardo@sydney.edu.au | W people.unisa.edu.au/Abelardo.Pardo http://people.unisa.edu.au/Abelardo.Pardo Project Lead of OnTaskLearning.org https://ontasklearning.org/ ORCID: 0000-0002-6857-0582 https://orcid.org/0000-0002-6857-0582

whol019 commented 6 years ago

Hi Abelardo I will webdropoff you our csv file Cannot upload it to github as there are people names/email etc. Cheers Wen From: Abelardo Pardo notifications@github.com Sent: Thursday, 7 June 2018 2:49 AM To: abelardopardo/ontask_b ontask_b@noreply.github.com Cc: whol019 wenchen.hol@gmail.com; Author author@noreply.github.com Subject: Re: [abelardopardo/ontask_b] csvupload issue with unicode field name (#43)

thank you.

Would it be possible to upload a CSV file to write a test case for this?

Best.

On Tue, Jun 5, 2018 at 6:15 PM whol019 notifications@github.com<mailto:notifications@github.com> wrote:

Hi Abelardo I have installed ontask from 2.5 to 2.7 now. Each time I will need to patch the csvupload file as currently it wont cope with unicode in field names. Sending you our csvupload file here, hope it could help. csvupload.py.txt https://github.com/abelardopardo/ontask_b/files/2074169/csvupload.py.txt it basically try to replace each column to unidecode(unicode(text, encoding = "utf-8"))

replace out this block with the following block

Process CSV file using pandas read_csv

try:

data_frame = pandas_db.load_df_from_csvfile(

request.FILES['file'],

form.cleaned_data['skip_lines_at_top'],

form.cleaned_data['skip_lines_at_bottom'])

except Exception as e:

form.add_error('file',

'File could not be processed ({0})'.format(e.message))

return render(request,

'dataops/upload1.html',

{'form': form,

'dtype': 'CSV',

'dtype_select': 'CSV file',

'prev_step': reverse('dataops:list')})

################################################################################# data_frame = pd.read_csv( request.FILES['file'], index_col=False, infer_datetime_format=True, quotechar='"', skiprows=form.cleaned_data['skip_lines_at_top'], skipfooter=form.cleaned_data['skip_lines_at_bottom']

,encoding='utf-8'

)

Strip white space from all string columns and try to convert to

datetime just in case

cols = {} try: for x in list(data_frame.columns): y=remove_non_ascii(x.strip()) cols[x]=y

if data_frame[x].dtype.name == 'object':

Column is a string!

data_frame[x] = data_frame[x].str.strip()

Try the datetime conversion

try: series = pd.to_datetime(data_frame[x], infer_datetime_format=True)

Datetime conversion worked! Update the data_frame

data_frame[x] = series except ValueError: pass data_frame.rename(columns=cols, inplace=True )

print( data_frame )

except Exception as e: form.add_error('file', 'File could not be processed ({0})'.format(e.message)) return render(request, 'dataops/upload1.html', {'form': form, 'dtype': 'CSV', 'dtype_select': 'CSV file', 'prev_step': reverse('dataops:list')})

########################################################################

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/abelardopardo/ontask_b/issues/43, or mute the thread https://github.com/notifications/unsubscribe-auth/AAnIiEnfsjba1eqpmMRJVGY2qoKQ2CSYks5t5wLsgaJpZM4UbpEZ .

-- ABELARDO PARDO | Professor and Dean Academic Division of Information Technology, Engineering and the Environment Honorary Associate, School of Electrical and Information Engineering, The University of Sydney Research Fellow, University of Texas at Arlington UNIVERSITY OF SOUTH AUSTRALIA Mawson Lakes Campus (IPC MLK-08) GPO Box 2471 | Adelaide | SA | 5001 T +61 8 8302 3200 | Twitter @abelardopardo E abelardo.pardo@unisa.edu.au<mailto:abelardo.pardo@unisa.edu.au> abelardo.pardo@sydney.edu.au<mailto:abelardo.pardo@sydney.edu.au> | W people.unisa.edu.au/Abelardo.Pardo http://people.unisa.edu.au/Abelardo.Pardo* Project Lead of OnTaskLearning.org https://ontasklearning.org/ ORCID: 0000-0002-6857-0582 https://orcid.org/0000-0002-6857-0582

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/abelardopardo/ontask_b/issues/43#issuecomment-395096063, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ASzlQNI9u_kUiZHcXGN1xAEoU_FtEMMLks5t5-vbgaJpZM4UbpEZ.

whol019 commented 6 years ago

Hi Abelardo Upload again our patch file for v2.7. ( Last week found out our previous script did not work with v2.7. Very sorry. ) Upload it here again. ( we mark out the user_passes_test as it seems not working well with our single sign on milddleware patch ) csvupload.py.txt

abelardopardo commented 6 years ago

Thank you. Looking into it.

On Mon, Jun 11, 2018 at 11:56 AM whol019 notifications@github.com wrote:

Hi Abelardo Upload again our patch file for v2.7. ( Last week found out our previous script did not work with v2.7. Very sorry. ) Upload it here again. ( we mark out the user_passes_test as it seems not working well with our single sign on milddleware patch ) csvupload.py.txt https://github.com/abelardopardo/ontask_b/files/2088518/csvupload.py.txt

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/abelardopardo/ontask_b/issues/43#issuecomment-396105664, or mute the thread https://github.com/notifications/unsubscribe-auth/AAnIiCp0uMWoYd7RykuDe8hE8rokypPeks5t7dVLgaJpZM4UbpEZ .

-- Abelardo Pardo

abelardopardo commented 6 years ago

The fix for this issue is the same as the one proposed for Issue #98