alexanderquispe / Diplomado_PUCP

This repository is for the Intensive Python Course at PUCP
34 stars 63 forks source link

Assignment_2 #44

Open anzonyquispe opened 2 years ago

anzonyquispe commented 2 years ago

Dear all,

  1. Follow the instructions in the Jupyter Notebook (JN) named as Assignment_2.
  2. Each group must create their branch named group_#_ass_2_2024_2 (group_1_ass_2_2024_2) and save their results in the Assignment_2 folder. Name your JN like your branch.
  3. Your Pull request should be linked to this issue.
  4. All the questions about the assignment should be posted in this issue.
  5. Follow the same procedure as in the previous assignment. Only the RM (Repository Maintainer) should do the final merge. Since, most of you will divide the work, the RP should make a summary of the work done by each member as a final comment before merging your work to the main branch. For the rest of the members be very explicit and detailed when you comment your work. Take this assignment as a simulacro of real work.
  6. The RP is the lead of all the assignments and will be the same during the entire course. If any member does not work, we should be able to see it clearly in the RP;s comments and in the branch's commit history.
  7. If we do not see any commit of a member, we will consider that it did not work and will not be graded and get 0 automatically. Deadline: 03/08/23 - 18:00
azula89 commented 2 years ago

Buenos dias Anzony. ¿Aún no has colgado el assignment_2, no?

Yoseph10 commented 2 years ago

Hi Anzony,

¿When is the deadline for this assignment?

anzonyquispe commented 2 years ago

Hi Anzony,

¿When is the deadline for this assignment?

Hi @Yoseph10 ,

The deadline is at 11:59 am on Saturday 27.

sirkaq commented 2 years ago

Anzony, there's a mistake about the columns in second question (rec_2). You wrote "CASEID, V201, V2018, V301, V302, ..., V323, V323A, V325A, V326, V327, V337, V359, V360, V361, V362, V363, V364, V367, V372, V372A, V375A, V376, V376A, V379, V380". There isn't "V2018" in data. Additionaly, Do we select this columns: "CASEID, V201, V218, V301, V302, V323, V323A, V325A, V326, V327, V337, V359, V360, V361, V362, V363, V364, V367, V372, V372A, V375A, V376, V376A, V379, V380" ? Because that means we have to select columns V302 and V323 instead of V302 until V323 (instruction that you wrote). Please, help me with that doubt.

anzonyquispe commented 2 years ago

Hi @sirkaq ,

You are right. There are two typos.

  1. The first typo is with the variable V2018. It should be V218.
  2. The second one is with .... There are no variables between V302 and V323.

I have already made the corrections. Fetch origin in the main branch to have the latest update of this JN. Thanks for these observations. Please, let me know if everything is clear.

lopezluzmila commented 2 years ago

Hi Anzony, in question 1.1 when I try to get labels from sav file using:

import savReaderWriter as sav
with sav.SavHeaderReader( r"..\..\_data\endes\2019\REC0111.sav", ioUtf8=True) as header:
    metadata = header.all()
    value_labels1 = metadata.valueLabels
    var_labels1 = metadata.varLabels

This message appears "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd3 in position 77: invalid continuation byte", It´s because I have to change something in "REC0111.sav"? (same situation with RE516171.sav and RE223132.sav)

GuilleSA commented 2 years ago

Hi Anzony, in the question 1.2 I don´t understand the meaning of the link you have given us as a hint. How can i update the values and vars if i take them as a dictionary? Why don't I update like in the first question with the 'attrs' instead of 'for loop'?

Sorry, I've solved it on my own way, but i have another question. In the same question i have a problema with rec2_1 and rec3_1. I am using the following equation to update the variables and values: new_var_labels1 = { key: var_labels1[ key ] for key in selcol_rec_1 } new_value_labels1 = { key: value_labels1[ key ] for key in selcol_rec_1 if key in value_labels1.keys() } it works only with rec1_1 but not with the other ones. The errors: Captura de pantalla 2021-11-25 021343

anzonyquispe commented 2 years ago

Hi Anzony, in question 1.1 when I try to get labels from sav file using:

import savReaderWriter as sav
with sav.SavHeaderReader( r"..\..\_data\endes\2019\REC0111.sav", ioUtf8=True) as header:
    metadata = header.all()
    value_labels1 = metadata.valueLabels
    var_labels1 = metadata.varLabels

This message appears "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd3 in position 77: invalid continuation byte", It´s because I have to change something in "REC0111.sav"? (same situation with RE516171.sav and RE223132.sav)

Hi @lopezluzmila , Did you try it without ioUtf8?

lopezluzmila commented 2 years ago

Hi Anzony, in question 1.1 when I try to get labels from sav file using:

import savReaderWriter as sav
with sav.SavHeaderReader( r"..\..\_data\endes\2019\REC0111.sav", ioUtf8=True) as header:
    metadata = header.all()
    value_labels1 = metadata.valueLabels
    var_labels1 = metadata.varLabels

This message appears "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd3 in position 77: invalid continuation byte", It´s because I have to change something in "REC0111.sav"? (same situation with RE516171.sav and RE223132.sav)

Hi @lopezluzmila , Did you try it without ioUtf8?

Yes, Anzony, but I get the same message image

anzonyquispe commented 2 years ago

Hi Anzony, in the question 1.2 I don´t understand the meaning of the link you have given us as a hint. How can i update the values and vars if i take them as a dictionary? Why don't I update like in the first question with the 'attrs' instead of 'for loop'?

Sorry, I've solved it on my own way, but i have another question. In the same question i have a problema with rec2_1 and rec3_1. I am using the following equation to update the variables and values: new_var_labels1 = { key: var_labels1[ key ] for key in selcol_rec_1 } new_value_labels1 = { key: value_labels1[ key ] for key in selcol_rec_1 if key in value_labels1.keys() } it works only with rec1_1 but not with the other ones. The errors: Captura de pantalla 2021-11-25 021343

Hi @GuilleSA ,

I think the problem with your code is that you are trying to find the key in the column names of the new data frame. I suggest finding the column name in the keys. I would try this: selected_cols_rec2_1 = rec2_1.columns() { col: var_labels2[ col ] for col in selected_cols_rec2_1 } Please, let me know if it clears your doubts.

anzonyquispe commented 2 years ago

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd3 in position 77: invalid continuation byte

Hi Anzony, in question 1.1 when I try to get labels from sav file using:

import savReaderWriter as sav
with sav.SavHeaderReader( r"..\..\_data\endes\2019\REC0111.sav", ioUtf8=True) as header:
    metadata = header.all()
    value_labels1 = metadata.valueLabels
    var_labels1 = metadata.varLabels

This message appears "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd3 in position 77: invalid continuation byte", It´s because I have to change something in "REC0111.sav"? (same situation with RE516171.sav and RE223132.sav)

Hi @lopezluzmila , Did you try it without ioUtf8?

Yes, Anzony, but I get the same message image

Hi @lopezluzmila ,

Did you open the sav file when you tried to import it to Python? I opened a specific issue to discuss this problem. Please, use it for replies.

anzonyquispe commented 2 years ago

Hi @lopezluzmila, I opened a specific issue to discuss this problem. Please, use it for replies.

anzonyquispe commented 2 years ago

We could not solve the issue with sav.SavHeaderReader, but we decided to use a different library. It works for @lopezluzmila and @hansaguirre1. I encourage you, students, to work with this library in your future jobs since It seems to present fewer bugs than the first one.

pip install pyreadstat import pyreadstat rec1, meta = pyreadstat.read_sav( r"../../_data/endes/2019/REC0111.sav" ) value_labels = meta.variable_value_labels var_labels = meta.column_names_to_labels

ClaudiaVillena27 commented 2 years ago

Hi @anzonyquispe, we got some problems with the question 2. When we try to create the new dictionary of values in rec1_1, we got this message:

KeyError Traceback (most recent call last)

in 2 for value in rec1_1: 3 ----> 4 new_value_labels1[ value ] = value_labels1[ value ] KeyError: 'CASEID'
RodrigoGrijalba commented 2 years ago

Hello @anzonyquispe.

In question 1.6, I get the following error: KeyError: "['V007'] not in index".

This implies that one of the years does not include variable V007.

Is this intended?

anzonyquispe commented 2 years ago

HI @RodrigoGrijalba ,

You were right. V007 is not a column for some sav files. Please, just skip that variable in the for loop.

GuilleSA commented 2 years ago

Hi Anzony, in the question 1.2 I don´t understand the meaning of the link you have given us as a hint. How can i update the values and vars if i take them as a dictionary? Why don't I update like in the first question with the 'attrs' instead of 'for loop'? Sorry, I've solved it on my own way, but i have another question. In the same question i have a problema with rec2_1 and rec3_1. I am using the following equation to update the variables and values: new_var_labels1 = { key: var_labels1[ key ] for key in selcol_rec_1 } new_value_labels1 = { key: value_labels1[ key ] for key in selcol_rec_1 if key in value_labels1.keys() } it works only with rec1_1 but not with the other ones. The errors: Captura de pantalla 2021-11-25 021343

Hi @GuilleSA ,

I think the problem with your code is that you are trying to find the key in the column names of the new data frame. I suggest finding the column name in the keys. I would try this: selected_cols_rec2_1 = rec2_1.columns() { col: var_labels2[ col ] for col in selected_cols_rec2_1 } Please, let me know if it clears your doubts.

@anzonyquispe I don't if it is my Jupyter, but I've tryed my ecuation in another document and it runs very well. However, in my branch it's wrong. Also it's weird that the same ecuation only run with rec1_1. I don't know what's happening.

anzonyquispe commented 2 years ago

Hi Anzony, in the question 1.2 I don´t understand the meaning of the link you have given us as a hint. How can i update the values and vars if i take them as a dictionary? Why don't I update like in the first question with the 'attrs' instead of 'for loop'? Sorry, I've solved it on my own way, but i have another question. In the same question i have a problema with rec2_1 and rec3_1. I am using the following equation to update the variables and values: new_var_labels1 = { key: var_labels1[ key ] for key in selcol_rec_1 } new_value_labels1 = { key: value_labels1[ key ] for key in selcol_rec_1 if key in value_labels1.keys() } it works only with rec1_1 but not with the other ones. The errors: Captura de pantalla 2021-11-25 021343

Hi @GuilleSA , I think the problem with your code is that you are trying to find the key in the column names of the new data frame. I suggest finding the column name in the keys. I would try this: selected_cols_rec2_1 = rec2_1.columns() { col: var_labels2[ col ] for col in selected_cols_rec2_1 } Please, let me know if it clears your doubts.

@anzonyquispe I don't if it is my Jupyter, but I've tryed my ecuation in another document and it runs very well. However, in my branch it's wrong. Also it's weird that the same ecuation only run with rec1_1. I don't know what's happening.

It is difficult to say. Please, send me an email to check the errors together.

rscoletti commented 2 years ago

Hi Anzony! We have some issues using the update method for 1.3 question.

image

anzonyquispe commented 2 years ago

Dear Students,

Do not use column V007 for rec1 data. This is because some files do not have this column.

rscoletti commented 2 years ago

Hi Azony! We have some issues with 1.3 question.

image

Can you help us to understand this error?

ReinerCruz commented 2 years ago

Hi, I would like to install this package pyreadstat and the code I'm using is pip install pyreadstat. But this message appears: You may need to restart the kernel to use updated packsges. I restart the kernel and nothing happens and I cannot run the codes since the beginning. Help, please. Maybe wey can have a short zoom meeting.

anzonyquispe commented 2 years ago

Hi, I would like to install this package pyreadstat and the code I'm using is pip install pyreadstat. But this message appears: You may need to restart the kernel to use updated packsges. I restart the kernel and nothing happens and I cannot run the codes since the beginning. Help, please. Maybe wey can have a short zoom meeting.

Hi @ReinerCruz ,

Sure. Send me an email with the zoom link.

lucianafv27 commented 2 years ago

Hey Anzony!

I have a question about the SavHeaderReader library. We've been working on my other team members computer and everything was running ok, but now that im trying to compile the code in my laptop its not working.

Screen Shot 2021-11-26 at 7 42 52 PM

Thank you in advance for your help!

anzonyquispe commented 2 years ago

Hey Anzony!

I have a question about the SavHeaderReader library. We've been working on my other team members computer and everything was running ok, but now that im trying to compile the code in my laptop its not working.

Screen Shot 2021-11-26 at 7 42 52 PM

Thank you in advance for your help!

Hi @lucianafv27 ,

Some students had problems with this code. I recommend you this library.

pip install pyreadstat import pyreadstat rec1, meta = pyreadstat.read_sav( r"../../_data/endes/2019/REC0111.sav" ) value_labels = meta.variable_value_labels var_labels = meta.column_names_to_labels

ReinerCruz commented 2 years ago

Anzony, I don´t know if you're available for the zoom meeting. This is the link: https://pucp.zoom.us/j/95553883832?pwd=aTQ2ZlVWMURxdmZoVi8vaFR2M3l5Zz09