Closed nostango closed 3 years ago
Hi Rudy! When I run your code I end up with a dataframe of size: 1013 rows × 2639 columns
When generating all_teams you add all the teams that each character is a part of. This is correct but it comes with duplicates which can be removed by making it a set, and then back into a list (because sets have no duplicates.
all_teams = list(set(all_teams))
all_teams.sort()
Furthermore, you remove the ambiguous characters which is good for the next exercise but it is actually easier to make a big dataframe with everyone (heroes+villains+ambiguous) and then sorting them out them later.
Then, if you remove the people that don't have any alliance
have_allies = data_teams.drop(columns=['faction']).sum(axis=1) > 0
data_teams = data_teams[have_allies]
you should end up with a 957 rows × 503 columns dataframe. Let me know! :)
Hey Daniel! Thanks for the help, I got 957 rows x 503 columns for my data!
Perfect! 👍
Issue summary
some rows are missing. I placed all relevant code here
Code and output (if applicable)
import numpy as np
def get_alliances(char, faction=None): """Return list of alliances for Marvel character.
lists that contain each character and the list that will store all the teams in the Marvel Universe
heroes = [name for name in os.listdir('data/heroes')] villains = [name for name in os.listdir('data/villains')] ambiguous = [name for name in os.listdir('data/ambiguous')] all_teamsa = [] all_teams = []
removes the '.txt' in a file in the directory
def remove_txt(fil): for i in range(len(fil)): word = fil[0] newhero = word.rsplit(".", 1)[0] fil.remove(fil[0]) fil.append(newhero) return fil
remove_txt(heroes) remove_txt(villains) remove_txt(ambiguous)
will take the teams from each character and then put them in the all teams list
def get_all_teams(charac): for l in range(len(charac)): #gets the number of files in the array of characters for team in get_alliances(charac[l]): #gets each team from the get_alliances function all_teams.append(team)
getting all the teams from the three different categories
get_all_teams(heroes) get_all_teams(villains) get_all_teams(ambiguous)
will take the character and return a vector representation of the alliances they are affiliated with
def vector_teams(charac):
new array that only has heroes and villains
new_all_char = heroes + villains new_all_char.sort()
for c in new_all_char: if len(get_alliances(c)) == 0: new_all_char.remove(c)
creating the target array
target_arr = [] for i in range(len(new_all_char)): target_arr.append(0)
enum = dict((c, i) for i, c in enumerate(new_all_char))
create the target array depending on whether the character in question is a hero, villain, or ambiguous
for c in new_all_char: for faction in ["heroes", "villains", "ambiguous"]: if c + ".txt" in os.listdir("data/%s" % faction): if faction == 'heroes': target_arr[enum.get(c)] += 1
d2 = {}
for c in new_all_char: d2[c] = vector_teams(c)
turn the dictionary holding the 2-D matrix into a 2_D list
dataMatrix = list([d2[i] for i in new_all_char])
X_ta = dataMatrix y_ta = target_arr
data_teams = pd.DataFrame.from_dict(d2, orient='index', columns = all_teams) data_teams['faction'] = target_arr data_teams
Insert your minimal code example here that e.g. reproduces your error or otherwise examplifies the problem you are having.