biopython / biopython

Official git repository for Biopython (originally converted from CVS)
http://biopython.org/
Other
4.38k stars 1.76k forks source link

Avoid single quotes in Python examples in Tutorial #1651

Closed peterjc closed 6 years ago

peterjc commented 6 years ago

Currently our Tutorial is written in LaTeX to produce HTML and PDF output.

The code examples using the double-quote are fine in both HTML and PDF.

The code examples using the single-quote work fine in HTML, but pdflatex replaces them with a pretty quote symbol in the PDF output. This breaks when copy-and-pasted into Python, e.g. #1647 and pull request #1648 for SearchIO.

Long term we'd like to move away from LaTeX (see #907), but in the short term any Python snippet the user would type should avoid the single quotes in favour of double quotes.

We could enforce this in the TravisCI/Tox style checks with grep?

peterjc commented 6 years ago

This grep command seems to catch a lot of examples, looking for anything with the leading >>> indicating a docstyle style Python snippet:

$ grep ">>> .*\'" Doc/Tutorial/*.tex
Doc/Tutorial/chapter_advanced.tex:>>> ftab = FreqTable.read_count(open('myCountFile'))
Doc/Tutorial/chapter_advanced.tex:>>> ftab = FreqTable.read_frequency(open('myFrequencyFile'))
Doc/Tutorial/chapter_appendix.tex:>>> my_info = 'A string\n with multiple lines.'
Doc/Tutorial/chapter_cluster.tex:>>> genetree = record.treecluster(method='s')
Doc/Tutorial/chapter_cluster.tex:>>> exptree = record.treecluster(dist='u', transpose=1)
Doc/Tutorial/chapter_entrez.tex:>>> handle = Entrez.esearch(db="nlmcatalog", term="computational[Journal]", retmax='20')
Doc/Tutorial/chapter_entrez.tex:>>> print("The first 20 are\n{}".format(record['IdList']))
Doc/Tutorial/chapter_entrez.tex:>>> info = record[0]['TitleMainList'][0]
Doc/Tutorial/chapter_entrez.tex:>>> for record in records['PubmedArticle']:
Doc/Tutorial/chapter_kegg.tex:>>> open("ec_5.4.2.2.txt", 'w').write(request.read())
Doc/Tutorial/chapter_motifs.tex:>>> m.counts['A']
Doc/Tutorial/chapter_motifs.tex:>>> m.counts['T', 0]
Doc/Tutorial/chapter_motifs.tex:>>> m.counts['T', 2]
Doc/Tutorial/chapter_motifs.tex:>>> m.counts['T', 3]
Doc/Tutorial/chapter_motifs.tex:>>> m.counts['A',:]
Doc/Tutorial/chapter_motifs.tex:>>> motif = record['Motif 1']
Doc/Tutorial/chapter_motifs.tex:>>> motif['ID'] # Using motif as a dictionary
Doc/Tutorial/chapter_motifs.tex:>>> with open("mytransfacfile.dat", 'w') as out_handle:
Doc/Tutorial/chapter_motifs.tex:>>> print(motifs.write(two_motifs, 'transfac'))
Doc/Tutorial/chapter_motifs.tex:>>> pwm = m.counts.normalize(pseudocounts={'A':0.6, 'C': 0.4, 'G': 0.4, 'T': 0.6})
Doc/Tutorial/chapter_motifs.tex:>>> background = {'A':0.3,'C':0.2,'G':0.2,'T':0.3}
Doc/Tutorial/chapter_motifs.tex:>>> motif.background = {'A': 0.2, 'C': 0.3, 'G': 0.3, 'T': 0.2}
Doc/Tutorial/chapter_motifs.tex:>>> m_reb1.pseudocounts = {'A':0.6, 'C': 0.4, 'G': 0.4, 'T': 0.6}
Doc/Tutorial/chapter_motifs.tex:>>> m_reb1.background = {'A':0.3,'C':0.2,'G':0.2,'T':0.3}
Doc/Tutorial/chapter_pdb.tex:>>> resolution = structure.header['resolution']
Doc/Tutorial/chapter_pdb.tex:>>> keywords = structure.header['keywords']
Doc/Tutorial/chapter_pdb.tex:>>> with open(filename, 'r') as handle:
Doc/Tutorial/chapter_pdb.tex:>>> structure = parser.get_structure('1fat', '1fat.cif')
Doc/Tutorial/chapter_pdb.tex:>>> mmcif_dict = MMCIF2Dict('1FAT.cif')
Doc/Tutorial/chapter_pdb.tex:>>> sc = mmcif_dict['_exptl_crystal.density_percent_sol']
Doc/Tutorial/chapter_pdb.tex:>>> y_list = mmcif_dict['_atom_site.Cartn_y']
Doc/Tutorial/chapter_pdb.tex:>>> io.save('out.pdb')
Doc/Tutorial/chapter_pdb.tex:>>> io.save('gly_only.pdb', GlySelect())
Doc/Tutorial/chapter_pdb.tex:>>> io.save('out.cif')
Doc/Tutorial/chapter_pdb.tex:>>> io.save('out.cif')
Doc/Tutorial/chapter_pdb.tex:>>> residue=chain[(' ', 100, ' ')]
Doc/Tutorial/chapter_pdb.tex:>>> res10 = chain[(' ', 10, ' ')]
Doc/Tutorial/chapter_pdb.tex:>>> n = residue['N'].get_vector()
Doc/Tutorial/chapter_pdb.tex:>>> c = residue['C'].get_vector()
Doc/Tutorial/chapter_pdb.tex:>>> ca = residue['CA'].get_vector()
Doc/Tutorial/chapter_pdb.tex:>>> chain = model['A']
Doc/Tutorial/chapter_pdb.tex:>>> atom = residue['CA']
Doc/Tutorial/chapter_pdb.tex:>>> atom = structure[0]['A'][100]['CA']
Doc/Tutorial/chapter_pdb.tex:>>> atom.disordered_select('A') # select altloc A atom
Doc/Tutorial/chapter_pdb.tex:>>> atom.disordered_select('B') # select altloc B atom
Doc/Tutorial/chapter_pdb.tex:>>> residue.disordered_select('CYS')
Doc/Tutorial/chapter_pdb.tex:>>> structure = p.get_structure('X', 'pdb1fat.ent')
Doc/Tutorial/chapter_pdb.tex:>>> res_list = Selection.unfold_entities(structure, 'R')
Doc/Tutorial/chapter_pdb.tex:>>> atom_list = Selection.unfold_entities(chain, 'A')
Doc/Tutorial/chapter_pdb.tex:>>> residue_list = Selection.unfold_entities(atom_list, 'R')
Doc/Tutorial/chapter_pdb.tex:>>> chain_list = Selection.unfold_entities(atom_list, 'C')
Doc/Tutorial/chapter_pdb.tex:>>> ca1 = residue1['CA']
Doc/Tutorial/chapter_pdb.tex:>>> ca2 = residue2['CA']
Doc/Tutorial/chapter_pdb.tex:>>> exp_ca = hse.calc_hs_exposure(model, option='CA3')
Doc/Tutorial/chapter_pdb.tex:>>> exp_cb=hse.calc_hs_exposure(model, option='CB')
Doc/Tutorial/chapter_pdb.tex:>>> pdbl.retrieve_pdb_file('1FAT')
Doc/Tutorial/chapter_pdb.tex:>>> pl = PDBList(pdb='/data/pdb')
Doc/Tutorial/chapter_phenotype.tex:    >>> record['A02']
Doc/Tutorial/chapter_phenotype.tex:>>> well = record['A02']  
Doc/Tutorial/chapter_phenotype.tex:>>> corrected = record.subtract_control(control='A01')
Doc/Tutorial/chapter_phenotype.tex:>>> record['A01'][63]
Doc/Tutorial/chapter_phenotype.tex:>>> corrected['A01'][63]
Doc/Tutorial/chapter_phenotype.tex:>>> well = record['A02'] 
Doc/Tutorial/chapter_phylo.tex:>>> Phylo.draw_graphviz(tree, prog='dot')
Doc/Tutorial/chapter_phylo.tex:>>> pylab.savefig('phylo-dot.png')  # Creates a PNG file of the same graphic
Doc/Tutorial/chapter_phylo.tex:>>> cmd = PhymlCommandline(input='Tests/Phylip/random.phy')
Doc/Tutorial/chapter_phylo.tex:>>> tree = Phylo.read('Tests/Phylip/random.phy_phyml_tree.txt', 'newick')
Doc/Tutorial/chapter_searchio.tex:>>> blast_qresult = SearchIO.read('my_blast.xml', 'blast-xml')
Doc/Tutorial/chapter_searchio.tex:>>> blat_qresult = SearchIO.read('my_blat.psl', 'blat-psl')
Doc/Tutorial/chapter_searchio.tex:>>> blast_qresult['gi|262205317|ref|NR_030195.1|']
Doc/Tutorial/chapter_searchio.tex:>>> 'gi|262205317|ref|NR_030195.1|' in blast_qresult
Doc/Tutorial/chapter_searchio.tex:>>> 'gi|262205317|ref|NR_030194.1|' in blast_qresult
Doc/Tutorial/chapter_searchio.tex:>>> blast_qresult.index('gi|301171437|ref|NR_035870.1|')
Doc/Tutorial/chapter_searchio.tex:>>> blast_qresult = SearchIO.read('my_blast.xml', 'blast-xml')
Doc/Tutorial/chapter_searchio.tex:>>> blat_qresult = SearchIO.read('my_blat.psl', 'blat-psl')
Doc/Tutorial/chapter_searchio.tex:>>> blast_qresult = SearchIO.read('my_blast.xml', 'blast-xml')
Doc/Tutorial/chapter_searchio.tex:>>> blat_qresult = SearchIO.read('my_blat.psl', 'blat-psl')
Doc/Tutorial/chapter_searchio.tex:>>> blast_qresult = SearchIO.read('my_blast.xml', 'blast-xml')
Doc/Tutorial/chapter_searchio.tex:>>> blat_qresult = SearchIO.read('my_blat.psl', 'blat-psl')
Doc/Tutorial/chapter_searchio.tex:>>> qresult = SearchIO.read('tab_2226_tblastn_003.txt', 'blast-tab')
Doc/Tutorial/chapter_searchio.tex:>>> qresult2 = SearchIO.read('tab_2226_tblastn_007.txt', 'blast-tab', comments=True)
Doc/Tutorial/chapter_searchio.tex:>>> qresults = SearchIO.parse('tab_2226_tblastn_001.txt', 'blast-tab')
Doc/Tutorial/chapter_searchio.tex:>>> qresults2 = SearchIO.parse('tab_2226_tblastn_005.txt', 'blast-tab', comments=True)
Doc/Tutorial/chapter_searchio.tex:>>> idx = SearchIO.index('tab_2226_tblastn_001.txt', 'blast-tab')
Doc/Tutorial/chapter_searchio.tex:>>> idx['gi|16080617|ref|NP_391444.1|']
Doc/Tutorial/chapter_searchio.tex:>>> idx = SearchIO.index('tab_2226_tblastn_005.txt', 'blast-tab', comments=True)
Doc/Tutorial/chapter_searchio.tex:>>> idx['gi|16080617|ref|NP_391444.1|']
Doc/Tutorial/chapter_searchio.tex:>>> idx = SearchIO.index('tab_2226_tblastn_001.txt', 'blast-tab', key_function=key_function)
Doc/Tutorial/chapter_searchio.tex:>>> idx['GI|16080617|REF|NP_391444.1|']
Doc/Tutorial/chapter_searchio.tex:>>> qresults = SearchIO.parse('mirna.xml', 'blast-xml')     # read XML file
Doc/Tutorial/chapter_searchio.tex:>>> SearchIO.write(qresults, 'results.tab', 'blast-tab')    # write to tabular file
Doc/Tutorial/chapter_searchio.tex:>>> SearchIO.convert('mirna.xml', 'blast-xml', 'results.tab', 'blast-tab')
Doc/Tutorial/chapter_seq_objects.tex:>>> my_seq = Seq('GATCGATGGGCCTATATAGGATCGAAAATCGC', IUPAC.unambiguous_dna)
Doc/Tutorial/chapter_seq_objects.tex:>>> my_seq = Seq('GATCGATGGGCCTATATAGGATCGAAAATCGC', IUPAC.unambiguous_dna)
Doc/Tutorial/chapter_seq_objects.tex:>>> prot_seq = Seq(``ACGT'', generic_protein)
Doc/Tutorial/chapter_seqio.tex:>>> print(gb_vrl[``AB811634.1''].description)
Doc/Tutorial/chapter_seqio.tex:% >>> print(gb_vrl.get_raw(``GQ333173.1''))
Doc/Tutorial/chapter_seqio.tex:>>> print(gb_vrl.get_raw(``AB811634.1''))
Doc/Tutorial/chapter_uniprot.tex:>>> handle = ExPASy.get_prosite_raw('PS00001')
Doc/Tutorial/chapter_uniprot.tex:>>> handle = ExPASy.get_prosite_raw('PS00001')
Doc/Tutorial/chapter_uniprot.tex:>>> handle = ExPASy.get_prosite_raw('PDOC00001')
Doc/Tutorial/chapter_uniprot.tex:>>> handle = ExPASy.get_prosite_entry('PS00001')
Doc/Tutorial/chapter_uniprot.tex:>>> handle = ExPASy.get_prodoc_entry('PDOC00001')

This grep would find multi-line doctest style lines:

$ grep "\.\.\. .*\'" Doc/Tutorial/*.tex
Doc/Tutorial/chapter_blast.tex:...             print('****Alignment****')
Doc/Tutorial/chapter_blast.tex:...             print('sequence:', alignment.title)
Doc/Tutorial/chapter_blast.tex:...             print('length:', alignment.length)
Doc/Tutorial/chapter_blast.tex:...             print('e value:', hsp.expect)
Doc/Tutorial/chapter_blast.tex:...             print(hsp.query[0:75] + '...')
Doc/Tutorial/chapter_blast.tex:...             print(hsp.match[0:75] + '...')
Doc/Tutorial/chapter_blast.tex:...             print(hsp.sbjct[0:75] + '...')
Doc/Tutorial/chapter_blast.tex:...             print('****Alignment****')
Doc/Tutorial/chapter_blast.tex:...             print('sequence:', alignment.title)
Doc/Tutorial/chapter_blast.tex:...             print('length:', alignment.length)
Doc/Tutorial/chapter_blast.tex:...             print('e value:', hsp.expect)
Doc/Tutorial/chapter_blast.tex:...             print(hsp.query[0:75] + '...')
Doc/Tutorial/chapter_blast.tex:...             print(hsp.match[0:75] + '...')
Doc/Tutorial/chapter_blast.tex:...             print(hsp.sbjct[0:75] + '...')
Doc/Tutorial/chapter_blast.tex:...                 print('****Alignment****')
Doc/Tutorial/chapter_blast.tex:...                 print('sequence:', alignment.title)
Doc/Tutorial/chapter_blast.tex:...                 print('length:', alignment.length)
Doc/Tutorial/chapter_blast.tex:...                 print('e value:', hsp.expect)
Doc/Tutorial/chapter_blast.tex:...                     dots = '...'
Doc/Tutorial/chapter_blast.tex:...                     dots = ''
Doc/Tutorial/chapter_entrez.tex:...     status = record['Entrezgene_track-info']['Gene-track']['Gene-track_status']
Doc/Tutorial/chapter_entrez.tex:...     if status.attributes['value']=='discontinued':
Doc/Tutorial/chapter_entrez.tex:...     geneid = record['Entrezgene_track-info']['Gene-track']['Gene-track_geneid']
Doc/Tutorial/chapter_entrez.tex:...     genename = record['Entrezgene_gene']['Gene-ref']['Gene-ref_locus']
Doc/Tutorial/chapter_motifs.tex:...     collection = 'CORE',
Doc/Tutorial/chapter_motifs.tex:...     tax_group = ['vertebrates', 'insects'],
Doc/Tutorial/chapter_motifs.tex:...     tf_class = 'Winged Helix-Turn-Helix',
Doc/Tutorial/chapter_motifs.tex:...     tf_family = ['Forkhead', 'Ets'],
Doc/Tutorial/chapter_motifs.tex:...     motif = motifs.read(handle, 'sites')
Doc/Tutorial/chapter_pdb.tex:...         if residue.get_name()=='GLY':
Doc/Tutorial/chapter_searchio.tex:...     hit.id = hit.id.split('|')[3]   # renames 'gi|301171322|ref|NR_035857.1|' to 'NR_035857.1'
Doc/Tutorial/chapter_seq_annot.tex:description='gi|45478711|ref|NC_005816.1| Yersinia pestis biovar Microtus ... sequence',
Doc/Tutorial/chapter_seq_annot.tex:'gi|45478711|ref|NC_005816.1| Yersinia pestis biovar Microtus ... pPCP1, complete sequence'
Doc/Tutorial/chapter_seq_annot.tex:...         print("%s %s" % (feature.type, feature.qualifiers.get('db_xref')))
Doc/Tutorial/chapter_uniprot.tex:...     print(record['ID'])
Doc/Tutorial/chapter_uniprot.tex:...     print(record['DE'])
Doc/Tutorial/chapter_uniprot.tex:...     ids = re.findall(r'HREF="/uniprot/(\w+)"', html_results)
Doc/Tutorial/chapter_uniprot.tex:...     ids = re.findall(r'href="/cgi-bin/niceprot\.pl\?(\w+)"', html_results)

However, this will still miss things.

Gasta88 commented 6 years ago

I'm working of the doctest parts inside the Tutorials.

peterjc commented 6 years ago

Massive progress with #1668 - good job!

$ grep ">>> .*\'" Doc/Tutorial/*.tex
Doc/Tutorial/chapter_appendix.tex:>>> my_info = 'A string\n with multiple lines.'
Doc/Tutorial/chapter_seq_objects.tex:>>> prot_seq = Seq(``ACGT'', generic_protein)
Doc/Tutorial/chapter_seqio.tex:>>> print(gb_vrl[``AB811634.1''].description)
Doc/Tutorial/chapter_seqio.tex:% >>> print(gb_vrl.get_raw(``GQ333173.1''))
Doc/Tutorial/chapter_seqio.tex:>>> print(gb_vrl.get_raw(``AB811634.1''))

The appendix entry looks like a simple fix, single quotes to double quotes.

The other four entries look completely wrong - that's the LaTeX syntax for pretty open and closing double quotes. Something must have gone funny but again plain single character double-quotes are needed here.

Doc/Tutorial/chapter_seq_annot.tex:description='gi|45478711|ref|NC_005816.1| Yersinia pestis biovar Microtus ... sequence',
Doc/Tutorial/chapter_seq_annot.tex:'gi|45478711|ref|NC_005816.1| Yersinia pestis biovar Microtus ... pPCP1, complete sequence'
Doc/Tutorial/chapter_uniprot.tex:...     ids = re.findall(r'HREF="/uniprot/(\w+)"', html_results)
Doc/Tutorial/chapter_uniprot.tex:...     ids = re.findall(r'href="/cgi-bin/niceprot\.pl\?(\w+)"', html_results)

The sequence annotation matches here false positives (they have a dot dot dot in the middle of the text), and are part of doctest output - they must stay as they are.

I think you are right to leave the regular expressions as they are.

There may of course be other quotes we ought to fix, but not as part of doctest style entries.

peterjc commented 6 years ago

Hmm, the SeqIO chapter changes adding this were by me some time ago - I wonder if I was using a "helpful" LaTeX editor?

https://github.com/biopython/biopython/commit/694f0b0befda3448d3c1dba32c4b2d198781e478

@Gasta88 Do you want to fix these last few issues, or should I?

Gasta88 commented 6 years ago

Allow me, but it will have to wait.

Nevermind, I've done it.

peterjc commented 6 years ago

Great, closing issue. Thank you!